Created attachment 134271 [details] Full console trace with various SysRq outputs Since v3.14 under normal desktop usage my s2disk/hibernate often blocks on the saving of the image data ("Saving 506031 image data pages () ..."). With following test I can reproduce the problem reliably: --- 0) Boot 1) Fill ram with 2GiB (+50% in my case) mount -t tmpfs tmpfs /media/test/ dd if=/dev/zero of=/media/test/test0.bin bs=1k count=$[1024*1024] dd if=/dev/zero of=/media/test/test1.bin bs=1k count=$[1024*1024] 2) Do s2disk s2disk --- s2disk: Unable to switch virtual terminals, using the current console. s2disk: Snapshotting system s2disk: System snapshot ready. Preparing to write s2disk: Image size: 2024124 kilobytes s2disk: Free swap: 3791208 kilobytes s2disk: Saving 506031 image data pages (press backspace to abort) ... 0% #Problem>: ... there is stays and blocks. SysRq still responds, so that I could trigger various debug outputs. --- I've bisected this to following commit: --- commit a1c3bfb2f67ef766de03f1f56bdfff9c8595ab14 (HEAD, refs/bisect/bad) Author: Johannes Weiner <hannes@cmpxchg.org> Date: Wed Jan 29 14:05:41 2014 -0800 mm/page-writeback.c: do not count anon pages as dirtyable memory [...] --- Reverting a1c3bfb2 fixes s2disk for me again - so basically I'm ok ;). But maybe there is still another better solution. Attached is a full console trace with various SysRq outputs, possibly useful for analyzing. BR, Oliver
Arch: x86_64
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 29 Apr 2014 20:13:44 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=75101 > > Bug ID: 75101 > Summary: [bisected] s2disk / hibernate blocks on "Saving 506031 > image data pages () ..." > Product: Memory Management > Version: 2.5 > Kernel Version: v3.14 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: akpm@linux-foundation.org > Reporter: oliverml1@oli1170.net > Regression: No > > Created attachment 134271 [details] > --> https://bugzilla.kernel.org/attachment.cgi?id=134271&action=edit > Full console trace with various SysRq outputs > > Since v3.14 under normal desktop usage my s2disk/hibernate often blocks on > the > saving of the image data ("Saving 506031 image data pages () ..."). A means to reproduce as well as a bisection result. Nice! Thanks. Johannes, could you please take a look? > With following test I can reproduce the problem reliably: > --- > 0) Boot > > 1) Fill ram with 2GiB (+50% in my case) > > mount -t tmpfs tmpfs /media/test/ > dd if=/dev/zero of=/media/test/test0.bin bs=1k count=$[1024*1024] > dd if=/dev/zero of=/media/test/test1.bin bs=1k count=$[1024*1024] > > 2) Do s2disk > > s2disk > > --- > s2disk: Unable to switch virtual terminals, using the current console. > s2disk: Snapshotting system > s2disk: System snapshot ready. Preparing to write > s2disk: Image size: 2024124 kilobytes > s2disk: Free swap: 3791208 kilobytes > s2disk: Saving 506031 image data pages (press backspace to abort) ... 0% > > #Problem>: ... there is stays and blocks. SysRq still responds, so that I > could > trigger various debug outputs. > --- > > I've bisected this to following commit: > --- > commit a1c3bfb2f67ef766de03f1f56bdfff9c8595ab14 (HEAD, refs/bisect/bad) > Author: Johannes Weiner <hannes@cmpxchg.org> > Date: Wed Jan 29 14:05:41 2014 -0800 > > mm/page-writeback.c: do not count anon pages as dirtyable memory > > [...] > --- > > Reverting a1c3bfb2 fixes s2disk for me again - so basically I'm ok ;). But > maybe there is still another better solution. > > Attached is a full console trace with various SysRq outputs, possibly useful > for analyzing. > > BR, Oliver >
Hi, On Tue, Apr 29, 2014 at 03:24:37PM -0700, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 29 Apr 2014 20:13:44 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=75101 > > > > Bug ID: 75101 > > Summary: [bisected] s2disk / hibernate blocks on "Saving 506031 > > image data pages () ..." > > Product: Memory Management > > Version: 2.5 > > Kernel Version: v3.14 > > Hardware: All > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Other > > Assignee: akpm@linux-foundation.org > > Reporter: oliverml1@oli1170.net > > Regression: No > > > > Created attachment 134271 [details] > > --> https://bugzilla.kernel.org/attachment.cgi?id=134271&action=edit > > Full console trace with various SysRq outputs > > > > Since v3.14 under normal desktop usage my s2disk/hibernate often blocks on > the > > saving of the image data ("Saving 506031 image data pages () ..."). > > A means to reproduce as well as a bisection result. Nice! Thanks. > > Johannes, could you please take a look? > > > With following test I can reproduce the problem reliably: > > --- > > 0) Boot > > > > 1) Fill ram with 2GiB (+50% in my case) > > > > mount -t tmpfs tmpfs /media/test/ > > dd if=/dev/zero of=/media/test/test0.bin bs=1k count=$[1024*1024] > > dd if=/dev/zero of=/media/test/test1.bin bs=1k count=$[1024*1024] > > > > 2) Do s2disk > > > > s2disk > > > > --- > > s2disk: Unable to switch virtual terminals, using the current console. > > s2disk: Snapshotting system > > s2disk: System snapshot ready. Preparing to write > > s2disk: Image size: 2024124 kilobytes > > s2disk: Free swap: 3791208 kilobytes > > s2disk: Saving 506031 image data pages (press backspace to abort) ... 0% > > > > #Problem>: ... there is stays and blocks. SysRq still responds, so that I > could > > trigger various debug outputs. According to your dmesg s2disk is stuck in balance_dirty_pages(): [ 215.645240] s2disk D ffff88011fd93100 0 3323 3261 0x00000000 [ 215.645240] ffff8801196d4110 0000000000000082 0000000000013100 ffff8801196d4110 [ 215.645240] ffff8800365cdfd8 ffff880119ed9190 00000000ffffc16c ffff8800365cdbe8 [ 215.645240] 0000000000000032 0000000000000032 ffff8801196d4110 0000000000000000 [ 215.645240] Call Trace: [ 215.645240] [<ffffffff8162fdce>] ? schedule_timeout+0xde/0xff [ 215.645240] [<ffffffff81041be1>] ? ftrace_raw_output_tick_stop+0x55/0x55 [ 215.645240] [<ffffffff81630987>] ? io_schedule_timeout+0x5d/0x7e [ 215.645240] [<ffffffff810cb035>] ? balance_dirty_pages_ratelimited+0x588/0x747 [ 215.645240] [<ffffffff812d0795>] ? radix_tree_tag_set+0x69/0xc4 [ 215.645240] [<ffffffff810c244e>] ? generic_file_buffered_write+0x1a8/0x21c [ 215.645240] [<ffffffff810c351e>] ? __generic_file_aio_write+0x1c7/0x1fe [ 215.645240] [<ffffffff81134ab5>] ? blkdev_aio_write+0x44/0x79 [ 215.645240] [<ffffffff8110c02a>] ? do_sync_write+0x56/0x76 [ 215.645240] [<ffffffff8110c33c>] ? vfs_write+0xa1/0xfb [ 215.645240] [<ffffffff8110ca08>] ? SyS_write+0x41/0x74 [ 215.645240] [<ffffffff81637622>] ? system_call_fastpath+0x16/0x1b but I don't see a flusher thread anywhere. What the bisected change does is allow the effective dirty threshold to drop fairly low, because anonymous pages are no longer considered dirtyable, and your usecase has particularly low free + cache pages: [ 196.375988] active_anon:328150 inactive_anon:118571 isolated_anon:0 [ 196.375988] active_file:1658 inactive_file:1823 isolated_file:0 [ 196.375988] unevictable:867 dirty:616 writeback:0 unstable:0 [ 196.375988] free:32320 slab_reclaimable:5129 slab_unreclaimable:5080 [ 196.375988] mapped:2684 shmem:424844 pagetables:1528 bounce:0 [ 196.375988] free_cma:0 Ignoring free pages due to dirty_balance_reserve, inactive+active file yields 3481 dirtyable pages, which sets the global limits to 174 pages background and 348 pages foreground with the default configuration. It's low, but not 0. So why is the dirtier throttled to starvation when the background flusher is not even running? Shouldn't they be looking at the same numbers and behave inversely? I'll dive into the writeback and throttling code, but also Ccing Maxim, Jan, and Fengguang. Maybe they have a faster answer. Thanks for the report!
Hello, On Mon 05-05-14 11:35:41, Johannes Weiner wrote: > On Tue, Apr 29, 2014 at 03:24:37PM -0700, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Tue, 29 Apr 2014 20:13:44 +0000 bugzilla-daemon@bugzilla.kernel.org > wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=75101 > > > > > > Bug ID: 75101 > > > Summary: [bisected] s2disk / hibernate blocks on "Saving > 506031 > > > image data pages () ..." > > > Product: Memory Management > > > Version: 2.5 > > > Kernel Version: v3.14 > > > Hardware: All > > > OS: Linux > > > Tree: Mainline > > > Status: NEW > > > Severity: normal > > > Priority: P1 > > > Component: Other > > > Assignee: akpm@linux-foundation.org > > > Reporter: oliverml1@oli1170.net > > > Regression: No > > > > > > Created attachment 134271 [details] > > > --> https://bugzilla.kernel.org/attachment.cgi?id=134271&action=edit > > > Full console trace with various SysRq outputs > > > > > > Since v3.14 under normal desktop usage my s2disk/hibernate often blocks > on the > > > saving of the image data ("Saving 506031 image data pages () ..."). > > > > A means to reproduce as well as a bisection result. Nice! Thanks. > > > > Johannes, could you please take a look? > > > > > With following test I can reproduce the problem reliably: > > > --- > > > 0) Boot > > > > > > 1) Fill ram with 2GiB (+50% in my case) > > > > > > mount -t tmpfs tmpfs /media/test/ > > > dd if=/dev/zero of=/media/test/test0.bin bs=1k count=$[1024*1024] > > > dd if=/dev/zero of=/media/test/test1.bin bs=1k count=$[1024*1024] > > > > > > 2) Do s2disk > > > > > > s2disk > > > > > > --- > > > s2disk: Unable to switch virtual terminals, using the current console. > > > s2disk: Snapshotting system > > > s2disk: System snapshot ready. Preparing to write > > > s2disk: Image size: 2024124 kilobytes > > > s2disk: Free swap: 3791208 kilobytes > > > s2disk: Saving 506031 image data pages (press backspace to abort) ... > 0% > > > > > > #Problem>: ... there is stays and blocks. SysRq still responds, so that I > could > > > trigger various debug outputs. > > According to your dmesg s2disk is stuck in balance_dirty_pages(): > > [ 215.645240] s2disk D ffff88011fd93100 0 3323 3261 > 0x00000000 > [ 215.645240] ffff8801196d4110 0000000000000082 0000000000013100 > ffff8801196d4110 > [ 215.645240] ffff8800365cdfd8 ffff880119ed9190 00000000ffffc16c > ffff8800365cdbe8 > [ 215.645240] 0000000000000032 0000000000000032 ffff8801196d4110 > 0000000000000000 > [ 215.645240] Call Trace: > [ 215.645240] [<ffffffff8162fdce>] ? schedule_timeout+0xde/0xff > [ 215.645240] [<ffffffff81041be1>] ? ftrace_raw_output_tick_stop+0x55/0x55 > [ 215.645240] [<ffffffff81630987>] ? io_schedule_timeout+0x5d/0x7e > [ 215.645240] [<ffffffff810cb035>] ? > balance_dirty_pages_ratelimited+0x588/0x747 > [ 215.645240] [<ffffffff812d0795>] ? radix_tree_tag_set+0x69/0xc4 > [ 215.645240] [<ffffffff810c244e>] ? > generic_file_buffered_write+0x1a8/0x21c > [ 215.645240] [<ffffffff810c351e>] ? __generic_file_aio_write+0x1c7/0x1fe > [ 215.645240] [<ffffffff81134ab5>] ? blkdev_aio_write+0x44/0x79 > [ 215.645240] [<ffffffff8110c02a>] ? do_sync_write+0x56/0x76 > [ 215.645240] [<ffffffff8110c33c>] ? vfs_write+0xa1/0xfb > [ 215.645240] [<ffffffff8110ca08>] ? SyS_write+0x41/0x74 > [ 215.645240] [<ffffffff81637622>] ? system_call_fastpath+0x16/0x1b > > but I don't see a flusher thread anywhere. > > What the bisected change does is allow the effective dirty threshold > to drop fairly low, because anonymous pages are no longer considered > dirtyable, and your usecase has particularly low free + cache pages: > > [ 196.375988] active_anon:328150 inactive_anon:118571 isolated_anon:0 > [ 196.375988] active_file:1658 inactive_file:1823 isolated_file:0 > [ 196.375988] unevictable:867 dirty:616 writeback:0 unstable:0 > [ 196.375988] free:32320 slab_reclaimable:5129 slab_unreclaimable:5080 > [ 196.375988] mapped:2684 shmem:424844 pagetables:1528 bounce:0 > [ 196.375988] free_cma:0 > > Ignoring free pages due to dirty_balance_reserve, inactive+active file > yields 3481 dirtyable pages, which sets the global limits to 174 pages > background and 348 pages foreground with the default configuration. > It's low, but not 0. OK, so we are over the dirty_limit. > So why is the dirtier throttled to starvation when the background flusher > is not even running? Shouldn't they be looking at the same numbers and > behave inversely? These days there isn't a background flusher thread but a workqueue which handles the flushing work. But still you should see that in a process list like "flush-$dev". Can you check whether balance_dirty_pages() properly calls bdi_start_background_writeback() and whether wb_do_writeback() gets to run (there are tracepoints in there)? Also can you have a look in /sys/kernel/debug/bdi/<dev>/stats? What is the estimated bandwith? Honza
Created attachment 135171 [details] session.log.s2disk.20140505_2238.bz2 Hello, 1) Attached a full function-trace log + other SysRq outputs, see [1] attached. I saw bdi_...() calls in the s2disk paths, but didn't check in detail Probably more efficient when one of you guys looks directly. 2) /sys/kernel/debug/bdi/<dev>/stats They are also in [1] - however the major/minors of my sdbX didn't match with the /sys/.../bdi/<dev>'s. So I just displayed them all. 3) What is the estimated bandwith? It's an Samsung SSD 840 PRO, in this system: Read: 237 MB/s, Write 265 MB/s - see [2] (the faster writing is maybe due caching?) Just by curiosity: Can you also reproduce it ? ... since the test is quite simple. Or is it something specific in my system here ? BR, Oliver --- [1] Attached session.log.s2disk.20140505_2238.bz2 - 18MiB uncompressed function-trace output + others - The bdi outputs are also in there [2] Rough bandwidth tests Read: --- gamix64:~# swapon -s Filename Type Size Used Priority /dev/sdb7 partition 4193276 0 -1 gamix64:~# dd if=/dev/sdb7 bs=1024 count=$[1024*1024*4] |pv >/dev/null 4GB 0:00:18 [ 226MB/s] [ <=> ]4193280+0 records in 4193280+0 records out 4293918720 bytes (4.3 GB) copied, 18.1509 s, 237 MB/s --- Write: --- gamix64:~# dd if=/dev/zero bs=1024 count=$[1024*1024*4] |pv >/root/Test/test1.bin 4194304+0 records inMB/s] [ <=> ] 4194304+0 records out 4294967296 bytes (4.3 GB) copied, 16.2039 s, 265 MB/s 4GB 0:00:15 [ 256MB/s] [ <=> ] --- On Mon, 5 May 2014 18:10:53 +0200 Jan Kara <jack@suse.cz> wrote: > > Ignoring free pages due to dirty_balance_reserve, inactive+active > > file yields 3481 dirtyable pages, which sets the global limits to > > 174 pages background and 348 pages foreground with the default > > configuration. It's low, but not 0. > OK, so we are over the dirty_limit. > > > So why is the dirtier throttled to starvation when the background > > flusher is not even running? Shouldn't they be looking at the same > > numbers and behave inversely? > These days there isn't a background flusher thread but a workqueue > which handles the flushing work. But still you should see that in a > process list like "flush-$dev". Can you check whether > balance_dirty_pages() properly calls bdi_start_background_writeback() > and whether wb_do_writeback() gets to run (there are tracepoints in > there)? Also can you have a look > in /sys/kernel/debug/bdi/<dev>/stats? What is the estimated bandwith? > > Honza
Hi Oliver, On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: > Hello, > > 1) Attached a full function-trace log + other SysRq outputs, see [1] > attached. > > I saw bdi_...() calls in the s2disk paths, but didn't check in detail > Probably more efficient when one of you guys looks directly. Thanks, this looks interesting. balance_dirty_pages() wakes up the bdi_wq workqueue as it should: [ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits <-balance_dirty_pages_ratelimited [ 249.148009] s2disk-3327 2.... 48550414us : global_dirtyable_memory <-global_dirty_limits [ 249.148009] s2disk-3327 2.... 48550414us : writeback_in_progress <-balance_dirty_pages_ratelimited [ 249.148009] s2disk-3327 2.... 48550414us : bdi_start_background_writeback <-balance_dirty_pages_ratelimited [ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on <-balance_dirty_pages_ratelimited [ 249.148009] s2disk-3327 2.... 48550414us : try_to_grab_pending <-mod_delayed_work_on [ 249.148009] s2disk-3327 2d... 48550414us : del_timer <-try_to_grab_pending [ 249.148009] s2disk-3327 2d... 48550415us : get_work_pool <-try_to_grab_pending [ 249.148009] s2disk-3327 2d... 48550415us : _raw_spin_lock <-try_to_grab_pending [ 249.148009] s2disk-3327 2d... 48550415us : get_work_pwq <-try_to_grab_pending [ 249.148009] s2disk-3327 2d... 48550415us : pwq_activate_delayed_work <-try_to_grab_pending [ 249.148009] s2disk-3327 2d... 48550415us : get_work_pwq <-pwq_activate_delayed_work [ 249.148009] s2disk-3327 2d... 48550415us : move_linked_works <-pwq_activate_delayed_work [ 249.148009] s2disk-3327 2d... 48550415us : get_work_pwq <-try_to_grab_pending [ 249.148009] s2disk-3327 2d... 48550416us : pwq_dec_nr_in_flight <-try_to_grab_pending [ 249.148009] s2disk-3327 2d... 48550416us : __queue_delayed_work <-mod_delayed_work_on [ 249.148009] s2disk-3327 2d... 48550416us : __queue_work <-mod_delayed_work_on [ 249.148009] s2disk-3327 2d... 48550416us : get_work_pool <-__queue_work [ 249.148009] s2disk-3327 2d... 48550416us : _raw_spin_lock <-__queue_work [ 249.148009] s2disk-3327 2d... 48550416us : insert_work <-__queue_work [ 249.148009] s2disk-3327 2d... 48550417us : get_pwq.isra.20 <-insert_work [ 249.148009] s2disk-3327 2d... 48550417us : wake_up_worker <-__queue_work [ 249.148009] s2disk-3327 2d... 48550417us : wake_up_process <-__queue_work [ 249.148009] s2disk-3327 2d... 48550417us : try_to_wake_up <-__queue_work [ 249.148009] s2disk-3327 2d... 48550417us : _raw_spin_lock_irqsave <-try_to_wake_up [ 249.148009] s2disk-3327 2d... 48550417us : task_waking_fair <-try_to_wake_up [ 249.148009] s2disk-3327 2d... 48550418us : select_task_rq_fair <-select_task_rq [ 249.148009] s2disk-3327 2d... 48550418us : idle_cpu <-select_task_rq_fair [ 249.148009] s2disk-3327 2d... 48550418us : idle_cpu <-select_task_rq_fair [ 249.148009] s2disk-3327 2d... 48550418us : cpus_share_cache <-try_to_wake_up [ 249.148009] s2disk-3327 2d... 48550418us : _raw_spin_lock <-try_to_wake_up [ 249.148009] s2disk-3327 2d... 48550419us : ttwu_do_activate.constprop.100 <-try_to_wake_up [ 249.148009] s2disk-3327 2d... 48550419us : activate_task <-ttwu_do_activate.constprop.100 [ 249.148009] s2disk-3327 2d... 48550419us : enqueue_task <-ttwu_do_activate.constprop.100 [ 249.148009] s2disk-3327 2d... 48550419us : update_rq_clock <-enqueue_task [ 249.148009] s2disk-3327 2d... 48550419us : enqueue_task_fair <-ttwu_do_activate.constprop.100 [ 249.148009] s2disk-3327 2d... 48550419us : update_curr <-enqueue_task_fair [ 249.148009] s2disk-3327 2d... 48550420us : update_min_vruntime <-update_curr [ 249.148009] s2disk-3327 2d... 48550420us : __compute_runnable_contrib.part.55 <-update_entity_load_avg [ 249.148009] s2disk-3327 2d... 48550420us : update_cfs_rq_blocked_load <-enqueue_task_fair [ 249.148009] s2disk-3327 2d... 48550420us : account_entity_enqueue <-enqueue_task_fair [ 249.148009] s2disk-3327 2d... 48550420us : update_cfs_shares <-enqueue_task_fair [ 249.148009] s2disk-3327 2d... 48550420us : __enqueue_entity <-enqueue_task_fair [ 249.148009] s2disk-3327 2d... 48550421us : hrtick_update <-ttwu_do_activate.constprop.100 [ 249.148009] s2disk-3327 2d... 48550421us : wq_worker_waking_up <-ttwu_do_activate.constprop.100 [ 249.148009] s2disk-3327 2d... 48550421us : kthread_data <-wq_worker_waking_up [ 249.148009] s2disk-3327 2d... 48550421us : ttwu_do_wakeup <-try_to_wake_up [ 249.148009] s2disk-3327 2d... 48550421us : check_preempt_curr <-ttwu_do_wakeup [ 249.148009] s2disk-3327 2d... 48550421us : check_preempt_wakeup <-check_preempt_curr [ 249.148009] s2disk-3327 2d... 48550422us : update_curr <-check_preempt_wakeup [ 249.148009] s2disk-3327 2d... 48550422us : wakeup_preempt_entity.isra.53 <-check_preempt_wakeup [ 249.148009] s2disk-3327 2d... 48550422us : _raw_spin_unlock_irqrestore <-try_to_wake_up [ 249.148009] s2disk-3327 2.... 48550423us : bdi_dirty_limit <-bdi_dirty_limits [ 249.148009] s2disk-3327 2d... 48550423us : _raw_spin_lock_irqsave <-__percpu_counter_sum [ 249.148009] s2disk-3327 2d... 48550423us : _raw_spin_unlock_irqrestore <-__percpu_counter_sum [ 249.148009] s2disk-3327 2d... 48550423us : _raw_spin_lock_irqsave <-__percpu_counter_sum [ 249.148009] s2disk-3327 2d... 48550424us : _raw_spin_unlock_irqrestore <-__percpu_counter_sum [ 249.148009] s2disk-3327 2.... 48550424us : bdi_position_ratio <-balance_dirty_pages_ratelimited [ 249.148009] s2disk-3327 2.... 48550424us : io_schedule_timeout <-balance_dirty_pages_ratelimited [ 249.148009] s2disk-3327 2.... 48550424us : __delayacct_blkio_start <-io_schedule_timeout [ 249.148009] s2disk-3327 2.... 48550424us : ktime_get_ts <-io_schedule_timeout [ 249.148009] s2disk-3327 2.... 48550424us : blk_flush_plug_list <-io_schedule_timeout [ 249.148009] s2disk-3327 2.... 48550425us : schedule_timeout <-io_schedule_timeout [ 249.148009] s2disk-3327 2.... 48550425us : lock_timer_base.isra.35 <-__mod_timer [ 249.148009] s2disk-3327 2.... 48550425us : _raw_spin_lock_irqsave <-lock_timer_base.isra.35 [ 249.148009] s2disk-3327 2d... 48550425us : detach_if_pending <-__mod_timer [ 249.148009] s2disk-3327 2d... 48550425us : idle_cpu <-__mod_timer [ 249.148009] s2disk-3327 2d... 48550425us : internal_add_timer <-__mod_timer [ 249.148009] s2disk-3327 2d... 48550425us : __internal_add_timer <-internal_add_timer [ 249.148009] s2disk-3327 2d... 48550426us : _raw_spin_unlock_irqrestore <-__mod_timer [ 249.148009] s2disk-3327 2.... 48550426us : schedule <-schedule_timeout [ 249.148009] s2disk-3327 2.... 48550426us : __schedule <-schedule_timeout [ 249.148009] s2disk-3327 2.... 48550426us : rcu_note_context_switch <-__schedule [ 249.148009] s2disk-3327 2.... 48550426us : rcu_sched_qs <-rcu_note_context_switch [ 249.148009] s2disk-3327 2.... 48550426us : _raw_spin_lock_irq <-__schedule [ 249.148009] s2disk-3327 2d... 48550427us : deactivate_task <-__schedule [ 249.148009] s2disk-3327 2d... 48550427us : dequeue_task <-__schedule [ 249.148009] s2disk-3327 2d... 48550427us : update_rq_clock <-dequeue_task [ 249.148009] s2disk-3327 2d... 48550427us : dequeue_task_fair <-__schedule [ 249.148009] s2disk-3327 2d... 48550427us : update_curr <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550427us : update_min_vruntime <-update_curr [ 249.148009] s2disk-3327 2d... 48550427us : cpuacct_charge <-update_curr [ 249.148009] s2disk-3327 2d... 48550428us : update_cfs_rq_blocked_load <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550428us : clear_buddies <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550428us : account_entity_dequeue <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550428us : update_min_vruntime <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550428us : update_cfs_shares <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550428us : update_curr <-update_cfs_shares [ 249.148009] s2disk-3327 2d... 48550429us : update_min_vruntime <-update_curr [ 249.148009] s2disk-3327 2d... 48550429us : account_entity_dequeue <-update_cfs_shares [ 249.148009] s2disk-3327 2d... 48550429us : account_entity_enqueue <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550429us : update_curr <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550429us : update_cfs_rq_blocked_load <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550429us : clear_buddies <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550429us : account_entity_dequeue <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550430us : update_min_vruntime <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550430us : update_cfs_shares <-dequeue_task_fair [ 249.148009] s2disk-3327 2d... 48550430us : hrtick_update <-__schedule [ 249.148009] s2disk-3327 2d... 48550430us : put_prev_task_fair <-__schedule [ 249.148009] s2disk-3327 2d... 48550430us : pick_next_task_fair <-pick_next_task [ 249.148009] s2disk-3327 2d... 48550430us : clear_buddies <-pick_next_task_fair [ 249.148009] s2disk-3327 2d... 48550431us : __dequeue_entity <-pick_next_task_fair but the worker wakeup doesn't actually do anything: [ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch <-__schedule [ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq <-worker_thread [ 249.148009] kworker/-3466 2d... 48550431us : need_to_create_worker <-worker_thread [ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle <-worker_thread [ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers <-worker_enter_idle [ 249.148009] kworker/-3466 2.... 48550432us : schedule <-worker_thread [ 249.148009] kworker/-3466 2.... 48550432us : __schedule <-worker_thread My suspicion is that this fails because the bdi_wq is frozen at this point and so the flush work never runs until resume, whereas before my patch the effective dirty limit was high enough so that image could be written in one go without being throttled; followed by an fsync() that then writes the pages in the context of the unfrozen s2disk. Does this make sense? Rafael? Tejun? > 2) /sys/kernel/debug/bdi/<dev>/stats > > They are also in [1] - however the major/minors of my sdbX didn't > match with the /sys/.../bdi/<dev>'s. So I just displayed them all. > > 3) What is the estimated bandwith? > > It's an Samsung SSD 840 PRO, in this system: Read: 237 MB/s, Write 265 > MB/s - see [2] (the faster writing is maybe due caching?) > > > Just by curiosity: > > Can you also reproduce it ? ... since the test is quite simple. > Or is it something specific in my system here ? I tried to reproduce it here but could never get to hang it in balance_dirty_pages() like you did. Thanks, Johannes > [1] Attached session.log.s2disk.20140505_2238.bz2 > - 18MiB uncompressed function-trace output + others > - The bdi outputs are also in there > > [2] Rough bandwidth tests > Read: > --- > gamix64:~# swapon -s > Filename Type Size Used > Priority > /dev/sdb7 partition 4193276 0 -1 > gamix64:~# dd if=/dev/sdb7 bs=1024 count=$[1024*1024*4] |pv >/dev/null > 4GB 0:00:18 [ 226MB/s] [ > <=> > ]4193280+0 records in > 4193280+0 records out > > 4293918720 bytes (4.3 GB) copied, 18.1509 s, 237 MB/s > --- > > Write: > --- > gamix64:~# dd if=/dev/zero bs=1024 count=$[1024*1024*4] |pv > >/root/Test/test1.bin > 4194304+0 records inMB/s] [ <=> > > ] > 4194304+0 records out > 4294967296 bytes (4.3 GB) copied, 16.2039 s, 265 MB/s > 4GB 0:00:15 [ 256MB/s] [ <=> > > ] > ---
I'm experiencing exactly the same hibernation problem and finally found this bug report. However, I'm running Ubuntu 14.10 (32 bit) with the default kernel 3.13.0-24, not 3.14. My machine is an Asus eeePC 1000HE with 2GB RAM and a 2GB swap partition. I recently replaced the hard disk by a Samsung SSD 840 EVO. I can reproduce the hang by first starting a couple of large applications: gimp, google-chrome and firefox and then running pm-hibernate. The system hangs after: s2disk: Snapshotting system s2disk: System snapshot ready. Preparing to write s2disk: Image size: 441652 kilobytes s2disk: Free swap: 1798624 kilobytes s2disk: Saving 110413 image data pages (press backspace to abort) ... 0%
I meant Ubuntu 14.04, not 14.10 obviously.
I checked the source code of the Ubuntu kernel (linux-image-3.13.0-24-generic), and the Ubuntu modifications indeed include commit a1c3bfb2 from the 3.14 kernel.
Rebuilding the Ubuntu kernel with commit a1c3bf reversed fixes this issue for me as well.
On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: > On 5/6/2014 1:33 AM, Johannes Weiner wrote: > >Hi Oliver, > > > >On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: > >>Hello, > >> > >>1) Attached a full function-trace log + other SysRq outputs, see [1] > >>attached. > >> > >>I saw bdi_...() calls in the s2disk paths, but didn't check in detail > >>Probably more efficient when one of you guys looks directly. > >Thanks, this looks interesting. balance_dirty_pages() wakes up the > >bdi_wq workqueue as it should: > > > >[ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits > <-balance_dirty_pages_ratelimited > >[ 249.148009] s2disk-3327 2.... 48550414us : global_dirtyable_memory > <-global_dirty_limits > >[ 249.148009] s2disk-3327 2.... 48550414us : writeback_in_progress > <-balance_dirty_pages_ratelimited > >[ 249.148009] s2disk-3327 2.... 48550414us : > bdi_start_background_writeback <-balance_dirty_pages_ratelimited > >[ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on > <-balance_dirty_pages_ratelimited > >but the worker wakeup doesn't actually do anything: > >[ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch > <-__schedule > >[ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq > <-worker_thread > >[ 249.148009] kworker/-3466 2d... 48550431us : need_to_create_worker > <-worker_thread > >[ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle > <-worker_thread > >[ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers > <-worker_enter_idle > >[ 249.148009] kworker/-3466 2.... 48550432us : schedule <-worker_thread > >[ 249.148009] kworker/-3466 2.... 48550432us : __schedule > <-worker_thread > > > >My suspicion is that this fails because the bdi_wq is frozen at this > >point and so the flush work never runs until resume, whereas before my > >patch the effective dirty limit was high enough so that image could be > >written in one go without being throttled; followed by an fsync() that > >then writes the pages in the context of the unfrozen s2disk. > > > >Does this make sense? Rafael? Tejun? > > Well, it does seem to make sense to me. From what I see, this is a deadlock in the userspace suspend model and just happened to work by chance in the past. Can we patch suspend-utils as follows? Alternatively, suspend-utils could clear the dirty limits before it starts writing and restore them post-resume. --- From 73d6546d5e264130e3d108c97d8317f86dc11149 Mon Sep 17 00:00:00 2001 From: Johannes Weiner <hannes@cmpxchg.org> Date: Thu, 12 Jun 2014 17:43:05 -0400 Subject: [patch] s2disk: fix buffered IO throttling deadlock in frozen state s2disk uses buffered IO when writing the snapshot image to disk. If it runs into the dirty limits, the kernel forces it to wait until the flusher threads clean some of the dirty pages. However, at this point s2disk already froze the system, including the flusher infrastructure, and the whole operation deadlocks. Open the resume device with O_SYNC to force flushing any dirty pages directly from the write() context before they accumulate and engage dirty throttling. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> --- suspend.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/suspend.c b/suspend.c index 479ce58555f7..1b9bed81f58a 100644 --- a/suspend.c +++ b/suspend.c @@ -2436,7 +2436,7 @@ int main(int argc, char *argv[]) suspend_error("Could not create %s/%s.", chroot_path, "resume"); goto Umount; } - resume_fd = open("resume", O_RDWR); + resume_fd = open("resume", O_RDWR | O_SYNC); if (resume_fd < 0) { ret = errno; suspend_error("Could not open the resume device.");
The good news is that with the O_SYNC flag, hibernation succeeds now. The bad news is that it takes forever. The write speed drops from 28.9 MB/s to 1.7 MB/s and the total hibernation time increases from 0:18 to 2:50.
On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: > On 6/13/2014 12:02 AM, Johannes Weiner wrote: > >On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: > >>On 5/6/2014 1:33 AM, Johannes Weiner wrote: > >>>Hi Oliver, > >>> > >>>On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: > >>>>Hello, > >>>> > >>>>1) Attached a full function-trace log + other SysRq outputs, see [1] > >>>>attached. > >>>> > >>>>I saw bdi_...() calls in the s2disk paths, but didn't check in detail > >>>>Probably more efficient when one of you guys looks directly. > >>>Thanks, this looks interesting. balance_dirty_pages() wakes up the > >>>bdi_wq workqueue as it should: > >>> > >>>[ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits > <-balance_dirty_pages_ratelimited > >>>[ 249.148009] s2disk-3327 2.... 48550414us : global_dirtyable_memory > <-global_dirty_limits > >>>[ 249.148009] s2disk-3327 2.... 48550414us : writeback_in_progress > <-balance_dirty_pages_ratelimited > >>>[ 249.148009] s2disk-3327 2.... 48550414us : > bdi_start_background_writeback <-balance_dirty_pages_ratelimited > >>>[ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on > <-balance_dirty_pages_ratelimited > >>>but the worker wakeup doesn't actually do anything: > >>>[ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch > <-__schedule > >>>[ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq > <-worker_thread > >>>[ 249.148009] kworker/-3466 2d... 48550431us : need_to_create_worker > <-worker_thread > >>>[ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle > <-worker_thread > >>>[ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers > <-worker_enter_idle > >>>[ 249.148009] kworker/-3466 2.... 48550432us : schedule > <-worker_thread > >>>[ 249.148009] kworker/-3466 2.... 48550432us : __schedule > <-worker_thread > >>> > >>>My suspicion is that this fails because the bdi_wq is frozen at this > >>>point and so the flush work never runs until resume, whereas before my > >>>patch the effective dirty limit was high enough so that image could be > >>>written in one go without being throttled; followed by an fsync() that > >>>then writes the pages in the context of the unfrozen s2disk. > >>> > >>>Does this make sense? Rafael? Tejun? > >>Well, it does seem to make sense to me. > > From what I see, this is a deadlock in the userspace suspend model and > >just happened to work by chance in the past. > > Well, it had been working for quite a while, so it was a rather large > opportunity > window it seems. :-) No doubt about that, and I feel bad that it broke. But it's still a deadlock that can't reasonably be accommodated from dirty throttling. It can't just put the flushers to sleep and then issue a large amount of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot the messenger, this bug needs to be addressed, not get papered over. > >Can we patch suspend-utils as follows? > > Perhaps we can. Let's ask the new maintainer. > > Rodolfo, do you think you can apply the patch below to suspend-utils? > > >Alternatively, suspend-utils > >could clear the dirty limits before it starts writing and restore them > >post-resume. > > That (and the patch too) doesn't seem to address the problem with existing > suspend-utils > binaries, however. It's userspace that freezes the system before issuing buffered IO, so my conclusion was that the bug is in there. This is arguable. I also wouldn't be opposed to a patch that sets the dirty limits to infinity from the ioctl that freezes the system or creates the image.
Created attachment 183141 [details] suspend-utils: set vm/dirty_bytes before hibernation I ran into this bug myself, given it was from 2014 I wrote a patch. I hope I understood the issue correctly, as I'm quite new to this stuff. Basically sets and remembers /proc/sys/vm/dirty_[ratio|bytes]. And sets bytes to void*(-1).
Created attachment 183151 [details] suspend-utils: set vm/dirty_bytes before hibernation
(In reply to Killian De Volder from comment #15) > Created attachment 183151 [details] > suspend-utils: set vm/dirty_bytes before hibernation I am on Ubuntu 14.04.3 platform and I use 4.2 Linux kernel. Ubuntu has pm-utils instead of suspend-utils. Can I apply your patch for pm-utils?
(In reply to Atilla from comment #16) > I am on Ubuntu 14.04.3 platform and I use 4.2 Linux kernel. Ubuntu has > pm-utils instead of suspend-utils. Can I apply your patch for pm-utils? I don't know what if pm-utils is like suspend-utils. IIRC it should only be a bug with pm-utils, unless suspend does the same ?
(In reply to Killian De Volder from comment #17) > (In reply to Atilla from comment #16) > > I am on Ubuntu 14.04.3 platform and I use 4.2 Linux kernel. Ubuntu has > > pm-utils instead of suspend-utils. Can I apply your patch for pm-utils? > I don't know what if pm-utils is like suspend-utils. > IIRC it should only be a bug with pm-utils, unless suspend does the same ? I found the relevant package, it's in uswsusp-1.0 where s2disk is located. I manually applied the patch and tried it last night on the hope that it would work but it hanged again. It hangs usually if used memory takes over 40% of the RAM. By the way I ran pm-hibernate to put it to hibernation. Shall I try it with s2disk ? Does it mind?
Has this one been solved already? Asking because I've encountered something similar during tests (kernel 4.9.20): "s2both: Saving 1474608 image data pages (press backspace to abort) ... 68%" And there it stalled. Reset necessary. 1474608 pages is close to the max for that box (max_size = 1510475 pages). So I'm wondering whether failure could have been due to this bug or just to overload.
Could this be related to "early writeout = y" in etc/suspend.conf? The comment there says: ## start writing out the image early, before buffers are full. ## will most of the time speed up overall writing time (default y) After "early writeout = n", I couldn't reproduce the bug any more so far, despite quite high memory-load. But I haven't tested it thoroughly. Speedwise I couldn't see a major difference, if any. So, for those affected it may be worth a try.
(In reply to Rainer Fiebig from comment #20) > Could this be related to "early writeout = y" in etc/suspend.conf? > > The comment there says: > ## start writing out the image early, before buffers are full. > ## will most of the time speed up overall writing time (default y) > > After "early writeout = n", I couldn't reproduce the bug any more so far, > despite quite high memory-load. But I haven't tested it thoroughly. > > Speedwise I couldn't see a major difference, if any. > > So, for those affected it may be worth a try. I don't think it has something to do with `"early writeout = y" in etc/suspend.conf`. I did a clean installation of Ubuntu 16.04 with 4.4.0-21 kernel and it suspends and hibenates well.
(In reply to Rainer Fiebig from comment #20) > Could this be related to "early writeout = y" in etc/suspend.conf? > > The comment there says: > ## start writing out the image early, before buffers are full. > ## will most of the time speed up overall writing time (default y) > > After "early writeout = n", I couldn't reproduce the bug any more so far, > despite quite high memory-load. But I haven't tested it thoroughly. > > Speedwise I couldn't see a major difference, if any. > > So, for those affected it may be worth a try. May be it's related to swap partition
(In reply to Atilla from comment #21) > (In reply to Rainer Fiebig from comment #20) > > Could this be related to "early writeout = y" in etc/suspend.conf? > > > > The comment there says: > > ## start writing out the image early, before buffers are full. > > ## will most of the time speed up overall writing time (default y) > > > > After "early writeout = n", I couldn't reproduce the bug any more so far, > > despite quite high memory-load. But I haven't tested it thoroughly. > > > > Speedwise I couldn't see a major difference, if any. > > > > So, for those affected it may be worth a try. > > I don't think it has something to do with `"early writeout = y" in > etc/suspend.conf`. I did a clean installation of Ubuntu 16.04 with 4.4.0-21 > kernel and it suspends and hibenates well. I should have been more specific. On my system this bug is a rare event and only occurred when I was doing tests with extremely high memory-load - close to what s2disk can handle. I've never seen it in normal cases. Now - if this bug generally only occurs when s2disk is near its limit (true for my system but induction of course) and buffering may have something to do with it (as far as I understand Comment 13) *and* "early writeout" has something to do with buffering then switching it off may be worth a try. And see what happens. If it still occurs we at least know that "early writeout" is not to blame.
(In reply to Atilla from comment #22) > (In reply to Rainer Fiebig from comment #20) > > Could this be related to "early writeout = y" in etc/suspend.conf? > > > > The comment there says: > > ## start writing out the image early, before buffers are full. > > ## will most of the time speed up overall writing time (default y) > > > > After "early writeout = n", I couldn't reproduce the bug any more so far, > > despite quite high memory-load. But I haven't tested it thoroughly. > > > > Speedwise I couldn't see a major difference, if any. > > > > So, for those affected it may be worth a try. > > May be it's related to swap partition Not on my system - that's for sure.
Wow! Here I am to revive this topic in 2019! I have exactly the same problem, on ubuntu 18.04.2 with basically all kernels since 4.15.0-42 up to 5, which was all I tested, currently on 4.18.0-17-generic... I guess this has nothing to do with the kernel anyway. It was working fine before, even with proprietary nvidia drivers which would generally cause a bug on the resume and not while saving the ram snapshot. I've been trying to tell this to the ubuntu guys and you can see my whole story with this problem right here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819915 Shortly, I tried with or without nvidia modules enabled (or intel or using nouveau), many different kernels, disabled i915, and this is all get in all those different combinations: https://launchpadlibrarian.net/417327528/i915.jpg The event is pretty random and seems to be more likely to happen after 2 or 4 gb of ram is ever used (I have 16 in total), and nothing changes if later I reduce the ram usage later. But is random, I successfully hibernated with 11gb in use yesterday, just resumed and hibernated 5 seconds later without doing nothing else than running hibernate, and got freeze there. This also happens randomly if there's just 3 or 2 gb in use, likely on the second attempt of after more than 5 minutes after the computer is on. What can be wrong here?
I cc'ed a bunch of people from bugzilla. Folks, please please please remember to reply via emailed reply-to-all. Don't use the bugzilla interface! On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> wrote: > On 6/13/2014 6:55 AM, Johannes Weiner wrote: > > On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: > >> On 6/13/2014 12:02 AM, Johannes Weiner wrote: > >>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: > >>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote: > >>>>> Hi Oliver, > >>>>> > >>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: > >>>>>> Hello, > >>>>>> > >>>>>> 1) Attached a full function-trace log + other SysRq outputs, see [1] > >>>>>> attached. > >>>>>> > >>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in detail > >>>>>> Probably more efficient when one of you guys looks directly. > >>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up the > >>>>> bdi_wq workqueue as it should: > >>>>> > >>>>> [ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits > <-balance_dirty_pages_ratelimited > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > global_dirtyable_memory <-global_dirty_limits > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > writeback_in_progress <-balance_dirty_pages_ratelimited > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > bdi_start_background_writeback <-balance_dirty_pages_ratelimited > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on > <-balance_dirty_pages_ratelimited > >>>>> but the worker wakeup doesn't actually do anything: > >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch > <-__schedule > >>>>> [ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq > <-worker_thread > >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : > need_to_create_worker <-worker_thread > >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle > <-worker_thread > >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers > <-worker_enter_idle > >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule > <-worker_thread > >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule > <-worker_thread > >>>>> > >>>>> My suspicion is that this fails because the bdi_wq is frozen at this > >>>>> point and so the flush work never runs until resume, whereas before my > >>>>> patch the effective dirty limit was high enough so that image could be > >>>>> written in one go without being throttled; followed by an fsync() that > >>>>> then writes the pages in the context of the unfrozen s2disk. > >>>>> > >>>>> Does this make sense? Rafael? Tejun? > >>>> Well, it does seem to make sense to me. > >>> From what I see, this is a deadlock in the userspace suspend model and > >>> just happened to work by chance in the past. > >> Well, it had been working for quite a while, so it was a rather large > >> opportunity > >> window it seems. :-) > > No doubt about that, and I feel bad that it broke. But it's still a > > deadlock that can't reasonably be accommodated from dirty throttling. > > > > It can't just put the flushers to sleep and then issue a large amount > > of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot > > the messenger, this bug needs to be addressed, not get papered over. > > > >>> Can we patch suspend-utils as follows? > >> Perhaps we can. Let's ask the new maintainer. > >> > >> Rodolfo, do you think you can apply the patch below to suspend-utils? > >> > >>> Alternatively, suspend-utils > >>> could clear the dirty limits before it starts writing and restore them > >>> post-resume. > >> That (and the patch too) doesn't seem to address the problem with existing > >> suspend-utils > >> binaries, however. > > It's userspace that freezes the system before issuing buffered IO, so > > my conclusion was that the bug is in there. This is arguable. I also > > wouldn't be opposed to a patch that sets the dirty limits to infinity > > from the ioctl that freezes the system or creates the image. > > OK, that sounds like a workable plan. > > How do I set those limits to infinity? Five years have passed and people are still hitting this. Killian described the workaround in comment 14 at https://bugzilla.kernel.org/show_bug.cgi?id=75101. People can use this workaround manually by hand or in scripts. But we really should find a proper solution. Maybe special-case the freezing of the flusher threads until all the writeout has completed. Or something else.
Created attachment 282103 [details] attachment-1247-0.html Wow! Here I am to revive this topic in 2019! I have exactly the same problem, on ubuntu 18.04.2 with basically all kernels since 4.15.0-42 up to 5, which was all I tested, currently on 4.18.0-17-generic... I guess this has nothing to do with the kernel anyway. It was working fine before, even with proprietary nvidia drivers which would generally cause a bug on the resume and not while saving the ram snapshot. I've been trying to tell this to the ubuntu guys and you can see my whole story with this problem right here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819915 Shortly, I tried with or without nvidia modules enabled (or intel or using nouveau), many different kernels, disabled i915, and this is all get in all those different combinations: https://launchpadlibrarian.net/417327528/i915.jpg The event is pretty random and seems to be more likely to happen after 2 or 4 gb of ram is ever used (I have 16 in total), and nothing changes if later I reduce the ram usage later. But is random, I successfully hibernated with 11gb in use yesterday, just resumed and hibernated 5 seconds later without doing nothing else than running hibernate, and got freeze there. This also happens randomly if there's just 3 or 2 gb in use, likely on the second attempt of after more than 5 minutes after the computer is on. What can be wrong here? On Tue, Apr 2, 2019, 20:25 Andrew Morton <akpm@linux-foundation.org> wrote: > > I cc'ed a bunch of people from bugzilla. > > Folks, please please please remember to reply via emailed > reply-to-all. Don't use the bugzilla interface! > > On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" < > rafael.j.wysocki@intel.com> wrote: > > > On 6/13/2014 6:55 AM, Johannes Weiner wrote: > > > On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: > > >> On 6/13/2014 12:02 AM, Johannes Weiner wrote: > > >>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: > > >>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote: > > >>>>> Hi Oliver, > > >>>>> > > >>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: > > >>>>>> Hello, > > >>>>>> > > >>>>>> 1) Attached a full function-trace log + other SysRq outputs, see > [1] > > >>>>>> attached. > > >>>>>> > > >>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in > detail > > >>>>>> Probably more efficient when one of you guys looks directly. > > >>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up the > > >>>>> bdi_wq workqueue as it should: > > >>>>> > > >>>>> [ 249.148009] s2disk-3327 2.... 48550413us : > global_dirty_limits <-balance_dirty_pages_ratelimited > > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > global_dirtyable_memory <-global_dirty_limits > > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > writeback_in_progress <-balance_dirty_pages_ratelimited > > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > bdi_start_background_writeback <-balance_dirty_pages_ratelimited > > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > mod_delayed_work_on <-balance_dirty_pages_ratelimited > > >>>>> but the worker wakeup doesn't actually do anything: > > >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : > finish_task_switch <-__schedule > > >>>>> [ 249.148009] kworker/-3466 2.... 48550431us : > _raw_spin_lock_irq <-worker_thread > > >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : > need_to_create_worker <-worker_thread > > >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : > worker_enter_idle <-worker_thread > > >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : > too_many_workers <-worker_enter_idle > > >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule > <-worker_thread > > >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule > <-worker_thread > > >>>>> > > >>>>> My suspicion is that this fails because the bdi_wq is frozen at > this > > >>>>> point and so the flush work never runs until resume, whereas > before my > > >>>>> patch the effective dirty limit was high enough so that image > could be > > >>>>> written in one go without being throttled; followed by an fsync() > that > > >>>>> then writes the pages in the context of the unfrozen s2disk. > > >>>>> > > >>>>> Does this make sense? Rafael? Tejun? > > >>>> Well, it does seem to make sense to me. > > >>> From what I see, this is a deadlock in the userspace suspend model > and > > >>> just happened to work by chance in the past. > > >> Well, it had been working for quite a while, so it was a rather large > > >> opportunity > > >> window it seems. :-) > > > No doubt about that, and I feel bad that it broke. But it's still a > > > deadlock that can't reasonably be accommodated from dirty throttling. > > > > > > It can't just put the flushers to sleep and then issue a large amount > > > of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot > > > the messenger, this bug needs to be addressed, not get papered over. > > > > > >>> Can we patch suspend-utils as follows? > > >> Perhaps we can. Let's ask the new maintainer. > > >> > > >> Rodolfo, do you think you can apply the patch below to suspend-utils? > > >> > > >>> Alternatively, suspend-utils > > >>> could clear the dirty limits before it starts writing and restore > them > > >>> post-resume. > > >> That (and the patch too) doesn't seem to address the problem with > existing > > >> suspend-utils > > >> binaries, however. > > > It's userspace that freezes the system before issuing buffered IO, so > > > my conclusion was that the bug is in there. This is arguable. I also > > > wouldn't be opposed to a patch that sets the dirty limits to infinity > > > from the ioctl that freezes the system or creates the image. > > > > OK, that sounds like a workable plan. > > > > How do I set those limits to infinity? > > Five years have passed and people are still hitting this. > > Killian described the workaround in comment 14 at > https://bugzilla.kernel.org/show_bug.cgi?id=75101. > > People can use this workaround manually by hand or in scripts. But we > really should find a proper solution. Maybe special-case the freezing > of the flusher threads until all the writeout has completed. Or > something else. >
On Tue 02-04-19 16:25:00, Andrew Morton wrote: > > I cc'ed a bunch of people from bugzilla. > > Folks, please please please remember to reply via emailed > reply-to-all. Don't use the bugzilla interface! > > On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" > <rafael.j.wysocki@intel.com> wrote: > > > On 6/13/2014 6:55 AM, Johannes Weiner wrote: > > > On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: > > >> On 6/13/2014 12:02 AM, Johannes Weiner wrote: > > >>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: > > >>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote: > > >>>>> Hi Oliver, > > >>>>> > > >>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: > > >>>>>> Hello, > > >>>>>> > > >>>>>> 1) Attached a full function-trace log + other SysRq outputs, see [1] > > >>>>>> attached. > > >>>>>> > > >>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in > detail > > >>>>>> Probably more efficient when one of you guys looks directly. > > >>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up the > > >>>>> bdi_wq workqueue as it should: > > >>>>> > > >>>>> [ 249.148009] s2disk-3327 2.... 48550413us : > global_dirty_limits <-balance_dirty_pages_ratelimited > > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > global_dirtyable_memory <-global_dirty_limits > > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > writeback_in_progress <-balance_dirty_pages_ratelimited > > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > bdi_start_background_writeback <-balance_dirty_pages_ratelimited > > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > mod_delayed_work_on <-balance_dirty_pages_ratelimited > > >>>>> but the worker wakeup doesn't actually do anything: > > >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch > <-__schedule > > >>>>> [ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq > <-worker_thread > > >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : > need_to_create_worker <-worker_thread > > >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle > <-worker_thread > > >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers > <-worker_enter_idle > > >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule > <-worker_thread > > >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule > <-worker_thread > > >>>>> > > >>>>> My suspicion is that this fails because the bdi_wq is frozen at this > > >>>>> point and so the flush work never runs until resume, whereas before > my > > >>>>> patch the effective dirty limit was high enough so that image could > be > > >>>>> written in one go without being throttled; followed by an fsync() > that > > >>>>> then writes the pages in the context of the unfrozen s2disk. > > >>>>> > > >>>>> Does this make sense? Rafael? Tejun? > > >>>> Well, it does seem to make sense to me. > > >>> From what I see, this is a deadlock in the userspace suspend model and > > >>> just happened to work by chance in the past. > > >> Well, it had been working for quite a while, so it was a rather large > > >> opportunity > > >> window it seems. :-) > > > No doubt about that, and I feel bad that it broke. But it's still a > > > deadlock that can't reasonably be accommodated from dirty throttling. > > > > > > It can't just put the flushers to sleep and then issue a large amount > > > of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot > > > the messenger, this bug needs to be addressed, not get papered over. > > > > > >>> Can we patch suspend-utils as follows? > > >> Perhaps we can. Let's ask the new maintainer. > > >> > > >> Rodolfo, do you think you can apply the patch below to suspend-utils? > > >> > > >>> Alternatively, suspend-utils > > >>> could clear the dirty limits before it starts writing and restore them > > >>> post-resume. > > >> That (and the patch too) doesn't seem to address the problem with > existing > > >> suspend-utils > > >> binaries, however. > > > It's userspace that freezes the system before issuing buffered IO, so > > > my conclusion was that the bug is in there. This is arguable. I also > > > wouldn't be opposed to a patch that sets the dirty limits to infinity > > > from the ioctl that freezes the system or creates the image. > > > > OK, that sounds like a workable plan. > > > > How do I set those limits to infinity? > > Five years have passed and people are still hitting this. > > Killian described the workaround in comment 14 at > https://bugzilla.kernel.org/show_bug.cgi?id=75101. > > People can use this workaround manually by hand or in scripts. But we > really should find a proper solution. Maybe special-case the freezing > of the flusher threads until all the writeout has completed. Or > something else. I've refreshed my memory wrt this bug and I believe the bug is really on the side of suspend-utils (uswsusp or however it is called). They are low level system tools, they ask the kernel to freeze all processes (SNAPSHOT_FREEZE ioctl), and then they rely on buffered writeback (which is relatively heavyweight infrastructure) to work. That is wrong in my opinion. I can see Johanness was suggesting in comment 11 to use O_SYNC in suspend-utils which worked but was too slow. Indeed O_SYNC is rather big hammer but using O_DIRECT should be what they need and get better performance - no additional buffering in the kernel, no dirty throttling, etc. They only need their buffer & device offsets sector aligned - they seem to be even page aligned in suspend-utils so they should be fine. And if the performance still sucks (currently they appear to do mostly random 4k writes so it probably would for rotating disks), they could use AIO DIO to get multiple pages in flight (as many as they dare to allocate buffers) and then the IO scheduler will reorder things as good as it can and they should get reasonable performance. Is there someone who works on suspend-utils these days? Because the repo I've found on kernel.org seems to be long dead (last commit in 2012). Honza
Yes I can sorta confirm the bug is in uswsusp. I removed the package and pm-utils and used both "systemctl hibernate" and "echo disk >> /sys/power/state" to hibernate. It seems to succeed and shuts down, I am just not able to resume from it, which seems to be a classical problem solved just by setting the resume swap file/partition on grub. (which i tried and didn't work even with nvidia disabled) Anyway uswsusp is still necessary because the default kernel hibernation doesn't work with the proprietary nvidia drivers as long as I know and tested. Is there anyway I could get any workaround to this bug on my current OS by the way? On Wed, Apr 3, 2019 at 7:04 AM Rainer Fiebig <jrf@mailbox.org> wrote: > > Am 03.04.19 um 11:34 schrieb Jan Kara: > > On Tue 02-04-19 16:25:00, Andrew Morton wrote: > >> > >> I cc'ed a bunch of people from bugzilla. > >> > >> Folks, please please please remember to reply via emailed > >> reply-to-all. Don't use the bugzilla interface! > >> > >> On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" > <rafael.j.wysocki@intel.com> wrote: > >> > >>> On 6/13/2014 6:55 AM, Johannes Weiner wrote: > >>>> On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: > >>>>> On 6/13/2014 12:02 AM, Johannes Weiner wrote: > >>>>>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: > >>>>>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote: > >>>>>>>> Hi Oliver, > >>>>>>>> > >>>>>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: > >>>>>>>>> Hello, > >>>>>>>>> > >>>>>>>>> 1) Attached a full function-trace log + other SysRq outputs, see > [1] > >>>>>>>>> attached. > >>>>>>>>> > >>>>>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in > detail > >>>>>>>>> Probably more efficient when one of you guys looks directly. > >>>>>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up the > >>>>>>>> bdi_wq workqueue as it should: > >>>>>>>> > >>>>>>>> [ 249.148009] s2disk-3327 2.... 48550413us : > global_dirty_limits <-balance_dirty_pages_ratelimited > >>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > global_dirtyable_memory <-global_dirty_limits > >>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > writeback_in_progress <-balance_dirty_pages_ratelimited > >>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > bdi_start_background_writeback <-balance_dirty_pages_ratelimited > >>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > mod_delayed_work_on <-balance_dirty_pages_ratelimited > >>>>>>>> but the worker wakeup doesn't actually do anything: > >>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us : > finish_task_switch <-__schedule > >>>>>>>> [ 249.148009] kworker/-3466 2.... 48550431us : > _raw_spin_lock_irq <-worker_thread > >>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us : > need_to_create_worker <-worker_thread > >>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle > <-worker_thread > >>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers > <-worker_enter_idle > >>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule > <-worker_thread > >>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule > <-worker_thread > >>>>>>>> > >>>>>>>> My suspicion is that this fails because the bdi_wq is frozen at this > >>>>>>>> point and so the flush work never runs until resume, whereas before > my > >>>>>>>> patch the effective dirty limit was high enough so that image could > be > >>>>>>>> written in one go without being throttled; followed by an fsync() > that > >>>>>>>> then writes the pages in the context of the unfrozen s2disk. > >>>>>>>> > >>>>>>>> Does this make sense? Rafael? Tejun? > >>>>>>> Well, it does seem to make sense to me. > >>>>>> From what I see, this is a deadlock in the userspace suspend model > and > >>>>>> just happened to work by chance in the past. > >>>>> Well, it had been working for quite a while, so it was a rather large > >>>>> opportunity > >>>>> window it seems. :-) > >>>> No doubt about that, and I feel bad that it broke. But it's still a > >>>> deadlock that can't reasonably be accommodated from dirty throttling. > >>>> > >>>> It can't just put the flushers to sleep and then issue a large amount > >>>> of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot > >>>> the messenger, this bug needs to be addressed, not get papered over. > >>>> > >>>>>> Can we patch suspend-utils as follows? > >>>>> Perhaps we can. Let's ask the new maintainer. > >>>>> > >>>>> Rodolfo, do you think you can apply the patch below to suspend-utils? > >>>>> > >>>>>> Alternatively, suspend-utils > >>>>>> could clear the dirty limits before it starts writing and restore them > >>>>>> post-resume. > >>>>> That (and the patch too) doesn't seem to address the problem with > existing > >>>>> suspend-utils > >>>>> binaries, however. > >>>> It's userspace that freezes the system before issuing buffered IO, so > >>>> my conclusion was that the bug is in there. This is arguable. I also > >>>> wouldn't be opposed to a patch that sets the dirty limits to infinity > >>>> from the ioctl that freezes the system or creates the image. > >>> > >>> OK, that sounds like a workable plan. > >>> > >>> How do I set those limits to infinity? > >> > >> Five years have passed and people are still hitting this. > >> > >> Killian described the workaround in comment 14 at > >> https://bugzilla.kernel.org/show_bug.cgi?id=75101. > >> > >> People can use this workaround manually by hand or in scripts. But we > >> really should find a proper solution. Maybe special-case the freezing > >> of the flusher threads until all the writeout has completed. Or > >> something else. > > > > I've refreshed my memory wrt this bug and I believe the bug is really on > > the side of suspend-utils (uswsusp or however it is called). They are low > > level system tools, they ask the kernel to freeze all processes > > (SNAPSHOT_FREEZE ioctl), and then they rely on buffered writeback (which is > > relatively heavyweight infrastructure) to work. That is wrong in my > > opinion. > > > > I can see Johanness was suggesting in comment 11 to use O_SYNC in > > suspend-utils which worked but was too slow. Indeed O_SYNC is rather big > > hammer but using O_DIRECT should be what they need and get better > > performance - no additional buffering in the kernel, no dirty throttling, > > etc. They only need their buffer & device offsets sector aligned - they > > seem to be even page aligned in suspend-utils so they should be fine. And > > if the performance still sucks (currently they appear to do mostly random > > 4k writes so it probably would for rotating disks), they could use AIO DIO > > to get multiple pages in flight (as many as they dare to allocate buffers) > > and then the IO scheduler will reorder things as good as it can and they > > should get reasonable performance. > > > > Is there someone who works on suspend-utils these days? Because the repo > > I've found on kernel.org seems to be long dead (last commit in 2012). > > > > Honza > > > > Whether it's suspend-utils (or uswsusp) or not could be answered quickly > by de-installing this package and using the kernel-methods instead. > >
Created attachment 282113 [details] pm-suspend.log Okay, I reinstalled pm-utils and make sure uswsusp was removed (apt remove --purge uswsusp). # fdisk -l | grep swap /dev/sda8 439762944 473214975 33452032 16G Linux swap root@matheus-Inspiron-15-7000-Gaming:/home/matheus# blkid /dev/sda8 /dev/sda8: UUID="70d967e6-ad52-4c21-baf0-01a813ccc6ac" TYPE="swap" PARTUUID="666096bb-0e72-431a-b981-9fd0c7e553ee" I have resume=70d967e6-ad52-4c21-baf0-01a813ccc6ac variable set in all linux comamnd kernel (GRUB_CMDLINE_LINUX_DEFAULT) as you can see on my attached boot-sequence. You can see more info about my setup and what I already did here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819915 > What doesn't work: hibernating or resuming? > And /var/log/pm-suspend.log might give you a clue what causes the problem. Resuming doesn't work and still don't work.I tried setting either my partition to /dev/sda8 or the uiid. It simply boots as if it was the fist boot. On Wed, Apr 3, 2019 at 2:55 PM Rainer Fiebig <jrf@mailbox.org> wrote: > > Am 03.04.19 um 18:59 schrieb Matheus Fillipe: > > Yes I can sorta confirm the bug is in uswsusp. I removed the package > > and pm-utils > > Matheus, > > there is no need to uninstall pm-utils. You actually need this to have > comfortable suspend/hibernate. > > The only additional option you will get from uswsusp is true s2both > (which is nice, imo). > > pm-utils provides something similar called "suspend-hybrid" which means > that the computer suspends and after a configurable time wakes up again > to go into hibernation. > > and used both "systemctl hibernate" and "echo disk >> > > /sys/power/state" to hibernate. It seems to succeed and shuts down, I > > am just not able to resume from it, which seems to be a classical > > problem solved just by setting the resume swap file/partition on grub. > > (which i tried and didn't work even with nvidia disabled) > > > > Anyway uswsusp is still necessary because the default kernel > > hibernation doesn't work with the proprietary nvidia drivers as long > > as I know and tested. > > What doesn't work: hibernating or resuming? > And /var/log/pm-suspend.log might give you a clue what causes the problem. > > > > > Is there anyway I could get any workaround to this bug on my current > > OS by the way? > > *I* don't know, I don't use Ubuntu. But what I would do now is > re-install pm-utils *without* uswsusp and make sure that you have got > the swap-partition/file right in grub.cfg or menu.lst (grub legacy). > > Then do a few pm-hibernate/resume and tell us what happened. > > So long! > > > > > On Wed, Apr 3, 2019 at 7:04 AM Rainer Fiebig <jrf@mailbox.org> wrote: > >> > >> Am 03.04.19 um 11:34 schrieb Jan Kara: > >>> On Tue 02-04-19 16:25:00, Andrew Morton wrote: > >>>> > >>>> I cc'ed a bunch of people from bugzilla. > >>>> > >>>> Folks, please please please remember to reply via emailed > >>>> reply-to-all. Don't use the bugzilla interface! > >>>> > >>>> On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" > <rafael.j.wysocki@intel.com> wrote: > >>>> > >>>>> On 6/13/2014 6:55 AM, Johannes Weiner wrote: > >>>>>> On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: > >>>>>>> On 6/13/2014 12:02 AM, Johannes Weiner wrote: > >>>>>>>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: > >>>>>>>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote: > >>>>>>>>>> Hi Oliver, > >>>>>>>>>> > >>>>>>>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: > >>>>>>>>>>> Hello, > >>>>>>>>>>> > >>>>>>>>>>> 1) Attached a full function-trace log + other SysRq outputs, see > [1] > >>>>>>>>>>> attached. > >>>>>>>>>>> > >>>>>>>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in > detail > >>>>>>>>>>> Probably more efficient when one of you guys looks directly. > >>>>>>>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up > the > >>>>>>>>>> bdi_wq workqueue as it should: > >>>>>>>>>> > >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550413us : > global_dirty_limits <-balance_dirty_pages_ratelimited > >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > global_dirtyable_memory <-global_dirty_limits > >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > writeback_in_progress <-balance_dirty_pages_ratelimited > >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > bdi_start_background_writeback <-balance_dirty_pages_ratelimited > >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > mod_delayed_work_on <-balance_dirty_pages_ratelimited > >>>>>>>>>> but the worker wakeup doesn't actually do anything: > >>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us : > finish_task_switch <-__schedule > >>>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550431us : > _raw_spin_lock_irq <-worker_thread > >>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us : > need_to_create_worker <-worker_thread > >>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us : > worker_enter_idle <-worker_thread > >>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us : > too_many_workers <-worker_enter_idle > >>>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule > <-worker_thread > >>>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule > <-worker_thread > >>>>>>>>>> > >>>>>>>>>> My suspicion is that this fails because the bdi_wq is frozen at > this > >>>>>>>>>> point and so the flush work never runs until resume, whereas > before my > >>>>>>>>>> patch the effective dirty limit was high enough so that image > could be > >>>>>>>>>> written in one go without being throttled; followed by an fsync() > that > >>>>>>>>>> then writes the pages in the context of the unfrozen s2disk. > >>>>>>>>>> > >>>>>>>>>> Does this make sense? Rafael? Tejun? > >>>>>>>>> Well, it does seem to make sense to me. > >>>>>>>> From what I see, this is a deadlock in the userspace suspend model > and > >>>>>>>> just happened to work by chance in the past. > >>>>>>> Well, it had been working for quite a while, so it was a rather large > >>>>>>> opportunity > >>>>>>> window it seems. :-) > >>>>>> No doubt about that, and I feel bad that it broke. But it's still a > >>>>>> deadlock that can't reasonably be accommodated from dirty throttling. > >>>>>> > >>>>>> It can't just put the flushers to sleep and then issue a large amount > >>>>>> of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot > >>>>>> the messenger, this bug needs to be addressed, not get papered over. > >>>>>> > >>>>>>>> Can we patch suspend-utils as follows? > >>>>>>> Perhaps we can. Let's ask the new maintainer. > >>>>>>> > >>>>>>> Rodolfo, do you think you can apply the patch below to suspend-utils? > >>>>>>> > >>>>>>>> Alternatively, suspend-utils > >>>>>>>> could clear the dirty limits before it starts writing and restore > them > >>>>>>>> post-resume. > >>>>>>> That (and the patch too) doesn't seem to address the problem with > existing > >>>>>>> suspend-utils > >>>>>>> binaries, however. > >>>>>> It's userspace that freezes the system before issuing buffered IO, so > >>>>>> my conclusion was that the bug is in there. This is arguable. I also > >>>>>> wouldn't be opposed to a patch that sets the dirty limits to infinity > >>>>>> from the ioctl that freezes the system or creates the image. > >>>>> > >>>>> OK, that sounds like a workable plan. > >>>>> > >>>>> How do I set those limits to infinity? > >>>> > >>>> Five years have passed and people are still hitting this. > >>>> > >>>> Killian described the workaround in comment 14 at > >>>> https://bugzilla.kernel.org/show_bug.cgi?id=75101. > >>>> > >>>> People can use this workaround manually by hand or in scripts. But we > >>>> really should find a proper solution. Maybe special-case the freezing > >>>> of the flusher threads until all the writeout has completed. Or > >>>> something else. > >>> > >>> I've refreshed my memory wrt this bug and I believe the bug is really on > >>> the side of suspend-utils (uswsusp or however it is called). They are low > >>> level system tools, they ask the kernel to freeze all processes > >>> (SNAPSHOT_FREEZE ioctl), and then they rely on buffered writeback (which > is > >>> relatively heavyweight infrastructure) to work. That is wrong in my > >>> opinion. > >>> > >>> I can see Johanness was suggesting in comment 11 to use O_SYNC in > >>> suspend-utils which worked but was too slow. Indeed O_SYNC is rather big > >>> hammer but using O_DIRECT should be what they need and get better > >>> performance - no additional buffering in the kernel, no dirty throttling, > >>> etc. They only need their buffer & device offsets sector aligned - they > >>> seem to be even page aligned in suspend-utils so they should be fine. And > >>> if the performance still sucks (currently they appear to do mostly random > >>> 4k writes so it probably would for rotating disks), they could use AIO > DIO > >>> to get multiple pages in flight (as many as they dare to allocate > buffers) > >>> and then the IO scheduler will reorder things as good as it can and they > >>> should get reasonable performance. > >>> > >>> Is there someone who works on suspend-utils these days? Because the repo > >>> I've found on kernel.org seems to be long dead (last commit in 2012). > >>> > >>> Honza > >>> > >> > >> Whether it's suspend-utils (or uswsusp) or not could be answered quickly > >> by de-installing this package and using the kernel-methods instead. > >> > >> > >
Created attachment 282115 [details] boot-sequence
Created attachment 282117 [details] boot-sequence Okay I found a way to get it working and there was also a huge mistake on my last boot-config, the resume was commented :P I basically followed this: https://askubuntu.com/a/1064114 but changed to: resume=/dev/disk/by-uuid/70d967e6-ad52-4c21-baf0-01a813ccc6ac (just the uuid wouldnt work) and this is probably the most important thing to do.it worked! I also set the resume variable in initramfs to my swap partition but this might nor be so important anyway since it's automatically detected. I tested both systemctl hibernate and pm-hibernate, i guess they call the same thing anyway. I attached a screenshot. Seems to be working fine without uswsusp and with nvidia proprietary drivers! On Wed, Apr 3, 2019 at 2:55 PM Rainer Fiebig <jrf@mailbox.org> wrote: > > Am 03.04.19 um 18:59 schrieb Matheus Fillipe: > > Yes I can sorta confirm the bug is in uswsusp. I removed the package > > and pm-utils > > Matheus, > > there is no need to uninstall pm-utils. You actually need this to have > comfortable suspend/hibernate. > > The only additional option you will get from uswsusp is true s2both > (which is nice, imo). > > pm-utils provides something similar called "suspend-hybrid" which means > that the computer suspends and after a configurable time wakes up again > to go into hibernation. > > and used both "systemctl hibernate" and "echo disk >> > > /sys/power/state" to hibernate. It seems to succeed and shuts down, I > > am just not able to resume from it, which seems to be a classical > > problem solved just by setting the resume swap file/partition on grub. > > (which i tried and didn't work even with nvidia disabled) > > > > Anyway uswsusp is still necessary because the default kernel > > hibernation doesn't work with the proprietary nvidia drivers as long > > as I know and tested. > > What doesn't work: hibernating or resuming? > And /var/log/pm-suspend.log might give you a clue what causes the problem. > > > > > Is there anyway I could get any workaround to this bug on my current > > OS by the way? > > *I* don't know, I don't use Ubuntu. But what I would do now is > re-install pm-utils *without* uswsusp and make sure that you have got > the swap-partition/file right in grub.cfg or menu.lst (grub legacy). > > Then do a few pm-hibernate/resume and tell us what happened. > > So long! > > > > > On Wed, Apr 3, 2019 at 7:04 AM Rainer Fiebig <jrf@mailbox.org> wrote: > >> > >> Am 03.04.19 um 11:34 schrieb Jan Kara: > >>> On Tue 02-04-19 16:25:00, Andrew Morton wrote: > >>>> > >>>> I cc'ed a bunch of people from bugzilla. > >>>> > >>>> Folks, please please please remember to reply via emailed > >>>> reply-to-all. Don't use the bugzilla interface! > >>>> > >>>> On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" > <rafael.j.wysocki@intel.com> wrote: > >>>> > >>>>> On 6/13/2014 6:55 AM, Johannes Weiner wrote: > >>>>>> On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: > >>>>>>> On 6/13/2014 12:02 AM, Johannes Weiner wrote: > >>>>>>>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: > >>>>>>>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote: > >>>>>>>>>> Hi Oliver, > >>>>>>>>>> > >>>>>>>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: > >>>>>>>>>>> Hello, > >>>>>>>>>>> > >>>>>>>>>>> 1) Attached a full function-trace log + other SysRq outputs, see > [1] > >>>>>>>>>>> attached. > >>>>>>>>>>> > >>>>>>>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in > detail > >>>>>>>>>>> Probably more efficient when one of you guys looks directly. > >>>>>>>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up > the > >>>>>>>>>> bdi_wq workqueue as it should: > >>>>>>>>>> > >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550413us : > global_dirty_limits <-balance_dirty_pages_ratelimited > >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > global_dirtyable_memory <-global_dirty_limits > >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > writeback_in_progress <-balance_dirty_pages_ratelimited > >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > bdi_start_background_writeback <-balance_dirty_pages_ratelimited > >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > mod_delayed_work_on <-balance_dirty_pages_ratelimited > >>>>>>>>>> but the worker wakeup doesn't actually do anything: > >>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us : > finish_task_switch <-__schedule > >>>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550431us : > _raw_spin_lock_irq <-worker_thread > >>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us : > need_to_create_worker <-worker_thread > >>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us : > worker_enter_idle <-worker_thread > >>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us : > too_many_workers <-worker_enter_idle > >>>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule > <-worker_thread > >>>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule > <-worker_thread > >>>>>>>>>> > >>>>>>>>>> My suspicion is that this fails because the bdi_wq is frozen at > this > >>>>>>>>>> point and so the flush work never runs until resume, whereas > before my > >>>>>>>>>> patch the effective dirty limit was high enough so that image > could be > >>>>>>>>>> written in one go without being throttled; followed by an fsync() > that > >>>>>>>>>> then writes the pages in the context of the unfrozen s2disk. > >>>>>>>>>> > >>>>>>>>>> Does this make sense? Rafael? Tejun? > >>>>>>>>> Well, it does seem to make sense to me. > >>>>>>>> From what I see, this is a deadlock in the userspace suspend model > and > >>>>>>>> just happened to work by chance in the past. > >>>>>>> Well, it had been working for quite a while, so it was a rather large > >>>>>>> opportunity > >>>>>>> window it seems. :-) > >>>>>> No doubt about that, and I feel bad that it broke. But it's still a > >>>>>> deadlock that can't reasonably be accommodated from dirty throttling. > >>>>>> > >>>>>> It can't just put the flushers to sleep and then issue a large amount > >>>>>> of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot > >>>>>> the messenger, this bug needs to be addressed, not get papered over. > >>>>>> > >>>>>>>> Can we patch suspend-utils as follows? > >>>>>>> Perhaps we can. Let's ask the new maintainer. > >>>>>>> > >>>>>>> Rodolfo, do you think you can apply the patch below to suspend-utils? > >>>>>>> > >>>>>>>> Alternatively, suspend-utils > >>>>>>>> could clear the dirty limits before it starts writing and restore > them > >>>>>>>> post-resume. > >>>>>>> That (and the patch too) doesn't seem to address the problem with > existing > >>>>>>> suspend-utils > >>>>>>> binaries, however. > >>>>>> It's userspace that freezes the system before issuing buffered IO, so > >>>>>> my conclusion was that the bug is in there. This is arguable. I also > >>>>>> wouldn't be opposed to a patch that sets the dirty limits to infinity > >>>>>> from the ioctl that freezes the system or creates the image. > >>>>> > >>>>> OK, that sounds like a workable plan. > >>>>> > >>>>> How do I set those limits to infinity? > >>>> > >>>> Five years have passed and people are still hitting this. > >>>> > >>>> Killian described the workaround in comment 14 at > >>>> https://bugzilla.kernel.org/show_bug.cgi?id=75101. > >>>> > >>>> People can use this workaround manually by hand or in scripts. But we > >>>> really should find a proper solution. Maybe special-case the freezing > >>>> of the flusher threads until all the writeout has completed. Or > >>>> something else. > >>> > >>> I've refreshed my memory wrt this bug and I believe the bug is really on > >>> the side of suspend-utils (uswsusp or however it is called). They are low > >>> level system tools, they ask the kernel to freeze all processes > >>> (SNAPSHOT_FREEZE ioctl), and then they rely on buffered writeback (which > is > >>> relatively heavyweight infrastructure) to work. That is wrong in my > >>> opinion. > >>> > >>> I can see Johanness was suggesting in comment 11 to use O_SYNC in > >>> suspend-utils which worked but was too slow. Indeed O_SYNC is rather big > >>> hammer but using O_DIRECT should be what they need and get better > >>> performance - no additional buffering in the kernel, no dirty throttling, > >>> etc. They only need their buffer & device offsets sector aligned - they > >>> seem to be even page aligned in suspend-utils so they should be fine. And > >>> if the performance still sucks (currently they appear to do mostly random > >>> 4k writes so it probably would for rotating disks), they could use AIO > DIO > >>> to get multiple pages in flight (as many as they dare to allocate > buffers) > >>> and then the IO scheduler will reorder things as good as it can and they > >>> should get reasonable performance. > >>> > >>> Is there someone who works on suspend-utils these days? Because the repo > >>> I've found on kernel.org seems to be long dead (last commit in 2012). > >>> > >>> Honza > >>> > >> > >> Whether it's suspend-utils (or uswsusp) or not could be answered quickly > >> by de-installing this package and using the kernel-methods instead. > >> > >> > >
Created attachment 282119 [details] Screenshot at 2019-04-03 16-42-48.png
On Wednesday, April 3, 2019 11:34:32 AM CEST Jan Kara wrote: > On Tue 02-04-19 16:25:00, Andrew Morton wrote: > > > > I cc'ed a bunch of people from bugzilla. > > > > Folks, please please please remember to reply via emailed > > reply-to-all. Don't use the bugzilla interface! > > > > On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" > <rafael.j.wysocki@intel.com> wrote: > > > > > On 6/13/2014 6:55 AM, Johannes Weiner wrote: > > > > On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: > > > >> On 6/13/2014 12:02 AM, Johannes Weiner wrote: > > > >>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: > > > >>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote: > > > >>>>> Hi Oliver, > > > >>>>> > > > >>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: > > > >>>>>> Hello, > > > >>>>>> > > > >>>>>> 1) Attached a full function-trace log + other SysRq outputs, see > [1] > > > >>>>>> attached. > > > >>>>>> > > > >>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in > detail > > > >>>>>> Probably more efficient when one of you guys looks directly. > > > >>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up the > > > >>>>> bdi_wq workqueue as it should: > > > >>>>> > > > >>>>> [ 249.148009] s2disk-3327 2.... 48550413us : > global_dirty_limits <-balance_dirty_pages_ratelimited > > > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > global_dirtyable_memory <-global_dirty_limits > > > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > writeback_in_progress <-balance_dirty_pages_ratelimited > > > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > bdi_start_background_writeback <-balance_dirty_pages_ratelimited > > > >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : > mod_delayed_work_on <-balance_dirty_pages_ratelimited > > > >>>>> but the worker wakeup doesn't actually do anything: > > > >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : > finish_task_switch <-__schedule > > > >>>>> [ 249.148009] kworker/-3466 2.... 48550431us : > _raw_spin_lock_irq <-worker_thread > > > >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : > need_to_create_worker <-worker_thread > > > >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : > worker_enter_idle <-worker_thread > > > >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers > <-worker_enter_idle > > > >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule > <-worker_thread > > > >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule > <-worker_thread > > > >>>>> > > > >>>>> My suspicion is that this fails because the bdi_wq is frozen at > this > > > >>>>> point and so the flush work never runs until resume, whereas before > my > > > >>>>> patch the effective dirty limit was high enough so that image could > be > > > >>>>> written in one go without being throttled; followed by an fsync() > that > > > >>>>> then writes the pages in the context of the unfrozen s2disk. > > > >>>>> > > > >>>>> Does this make sense? Rafael? Tejun? > > > >>>> Well, it does seem to make sense to me. > > > >>> From what I see, this is a deadlock in the userspace suspend model > and > > > >>> just happened to work by chance in the past. > > > >> Well, it had been working for quite a while, so it was a rather large > > > >> opportunity > > > >> window it seems. :-) > > > > No doubt about that, and I feel bad that it broke. But it's still a > > > > deadlock that can't reasonably be accommodated from dirty throttling. > > > > > > > > It can't just put the flushers to sleep and then issue a large amount > > > > of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot > > > > the messenger, this bug needs to be addressed, not get papered over. > > > > > > > >>> Can we patch suspend-utils as follows? > > > >> Perhaps we can. Let's ask the new maintainer. > > > >> > > > >> Rodolfo, do you think you can apply the patch below to suspend-utils? > > > >> > > > >>> Alternatively, suspend-utils > > > >>> could clear the dirty limits before it starts writing and restore > them > > > >>> post-resume. > > > >> That (and the patch too) doesn't seem to address the problem with > existing > > > >> suspend-utils > > > >> binaries, however. > > > > It's userspace that freezes the system before issuing buffered IO, so > > > > my conclusion was that the bug is in there. This is arguable. I also > > > > wouldn't be opposed to a patch that sets the dirty limits to infinity > > > > from the ioctl that freezes the system or creates the image. > > > > > > OK, that sounds like a workable plan. > > > > > > How do I set those limits to infinity? > > > > Five years have passed and people are still hitting this. > > > > Killian described the workaround in comment 14 at > > https://bugzilla.kernel.org/show_bug.cgi?id=75101. > > > > People can use this workaround manually by hand or in scripts. But we > > really should find a proper solution. Maybe special-case the freezing > > of the flusher threads until all the writeout has completed. Or > > something else. > > I've refreshed my memory wrt this bug and I believe the bug is really on > the side of suspend-utils (uswsusp or however it is called). They are low > level system tools, they ask the kernel to freeze all processes > (SNAPSHOT_FREEZE ioctl), and then they rely on buffered writeback (which is > relatively heavyweight infrastructure) to work. That is wrong in my > opinion. > > I can see Johanness was suggesting in comment 11 to use O_SYNC in > suspend-utils which worked but was too slow. Indeed O_SYNC is rather big > hammer but using O_DIRECT should be what they need and get better > performance - no additional buffering in the kernel, no dirty throttling, > etc. They only need their buffer & device offsets sector aligned - they > seem to be even page aligned in suspend-utils so they should be fine. And > if the performance still sucks (currently they appear to do mostly random > 4k writes so it probably would for rotating disks), they could use AIO DIO > to get multiple pages in flight (as many as they dare to allocate buffers) > and then the IO scheduler will reorder things as good as it can and they > should get reasonable performance. > > Is there someone who works on suspend-utils these days? Not that I know of. > Because the repo I've found on kernel.org seems to be long dead > (last commit in 2012). And that's where the things are as of today, AFAICS. Cheers, Rafael
> So you got hibernate working now with pm-utils*and* the prop. Nvidia > drivers. That's good - although a bit contrary to what you said in > Comment 29: > I was told so, long time ago struggling to get nvidia prop to resume from hibernation, I found out that uswsusp was better for it (googling or on irc), and it indeed worked better, or with less effort at least, back then. I made that comment thinking this was true but I just proved myself wrong... > till puzzles me is that while others are having problems, > suspend-utils/uswsusp work for me almost 100 % of the time, except for a > few extreme test-cases in the past. You also said that it worked > "flawlessly" for yo Yes! It worked pretty good on version 18.04.1 of ubuntu with kernel 4.15.0-42 and 41 using uswsusp. There was a long problem with nvidia props that wouldn't let the system resume, but this was fixed when I upgraded to the latest version of the 415 nvidia driver. I kept like one month just hibernating to switch to windows and coming back to the restored snapshot of linux. You can check my apt history here: https://launchpadlibrarian.net/415602746/aptHistory.log. At the Start-Date: 2019-02-02 15:40:45, I'm 100% sure it was perfect. I am 100% sure that it wasn't already working anymore having the s2disk freeze issue at Start-Date: 2019-03-05 10:38:4. uswsusp also worked fine on ubuntu 16.04, but I dont remember the kernel versions. Now I'm currently with the nvidia 418.56, ubuntu 18.04.2, kernel 4.18.0-17-generic and hibernation with pm-utils works. I haven't found any major problem with it besides failing to suspend to ram yesterday, which I don't know if is related to it or not, but today I tested it after and before hibernation and seems to be ok. > So I'm wondering whether used-up swap space might play a role in this > matter, too. At least for the cases that I've seen on my system, I can't > rule this out. And when I look at the screenshot you provided in Comment > 27 (https://launchpadlibrarian.net/417327528/i915.jpg), sparse > swap-space could have been a factor in that case as well. Because > roughly 3.5 GB free swap-space doesn't seem much for a 16-GB-RAM box. On my many tests with uswsusp and a 16gb swap partition and 16 gb of ram, I noticed that it would be less likely to fail when less than something about 2 gb of ram, like just after boot up, it would though after the 3rd or 4th followed hibernation cycle. If after the boot up I allocate more than that value if would be much more likely to happen like always on the 2nd attempt, and if more than around 6gb would fail on the first attempt. Those aren't sure values, sometimes it failed regardless of ram usage, specially on my latest tests. also once it hibernated with more than 11gb ram usage and failed on the second attempt. So this is all happening pretty randomly. What I described above is just most of the cases and maybe this is just random anyway.