Bug 176461 - Suspend to Disk ineffective with high RAM usage
Summary: Suspend to Disk ineffective with high RAM usage
Status: RESOLVED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: Chen Yu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-06 00:31 UTC by doa379
Modified: 2017-06-25 17:09 UTC (History)
6 users (show)

See Also:
Kernel Version: 4.4, 4.7, 4.8
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description doa379 2016-10-06 00:31:13 UTC
There is a major issue with the suspend-to-disk feature in the kernel. High RAM usage (consisting of application data, user data, caches combined) results in the system stalling when suspending to disk. This is even with a swap partition size 2x RAM.

[code]
# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        9.6G        3.4G        841M        2.6G        4.8G
Swap:           30G          0B         30G
[/code]
Comment 1 doa379 2016-10-06 08:48:39 UTC
By high RAM usage, this is 30-40% of total RAM usage including buffers and caches.

There are two faults.

1. There is still a memory compaction issue in the kernel.

2. The kernel creates a complete hibernation image in RAM before scratching it to the swap  area. Thus, only around half the amount of RAM can be suspended to disk. This is a major flaw as the system should write to swap as the image is being generated. No amount of swap space will resolve this issue.
Comment 2 Chen Yu 2016-10-10 13:29:13 UTC
(In reply to doa379 from comment #0)
> There is a major issue with the suspend-to-disk feature in the kernel. High
> RAM usage (consisting of application data, user data, caches combined)
> results in the system stalling when suspending to disk. This is even with a
> swap partition size 2x RAM.
Is there any warning when system is  "stalling"? I guess the system is trying to reclaim some pages at that time.
> By high RAM usage, this is 30-40% of total RAM usage including buffers and
> caches.
> 
> There are two faults.
> 
> 1. There is still a memory compaction issue in the kernel.
The lzo compressing is to reduce the store size on swap device, but not the snapshot in memory.
> 
> 2. The kernel creates a complete hibernation image in RAM before scratching
> it to the swap  area. Thus, only around half the amount of RAM can be
> suspended to disk. This is a major flaw as the system should write to swap
> as the image is being generated. No amount of swap space will resolve this
> issue.
Currently the algorithm is to reserve about 1/2 of the ram for snapshot creation, and after that write it to swap device. I think it would be hard to write the image to disk without the snapshot(image) fully created as you suggested, because in that way the system would not know which pages should be saved to disk(at the same time when you are writing data to the disk, you are also creating page cache/buffer as well, in theory hibernation should not save these newly-created pages, it requires a method to distinguish these special page caches from normal io pages, you might have to introduce new page flag such as GFP_NO_SAVE and that would be too overkilled IMO).

I think it's better to look at the first problem that why the system stalls when suspend to disk.
Comment 3 doa379 2016-10-10 19:00:23 UTC
When the kernel is compiled with the pm debugging option during hibernation under the conditions described, the warning messages generated are "Write error to swap device". Under these conditions the screen goes blank and system just waits echoing this error and doesn't poweroff.

Hibernation works perfectly otherwise as long as the total RAM usage is << 50%.

So after fudging around with this and adopting Zram for VM compression I ended up realising that the system is indeed using 1/2 the RAM for snapshot creation. This I feel is not a good implementation. What if there is a system which uses hibernation when RAM usage is at 80% or 90%? It just wouldn't work. The feature must be solid and reliable. I personally am fairly heavy user of redundant RAM as tmpfs. The system runs continuously and following an interruption the system must resume where it left off.

One method I suggest is to create a meta-snapshot in a small amount of reserved RAM or initial swap area, of the image consisting of relevant pages at the point of system freeze. Then writing only those pages identified in the meta-snapshot directly to swap thereafter.
Comment 4 Chen Yu 2016-10-11 16:41:04 UTC
(In reply to doa379 from comment #3)
> When the kernel is compiled with the pm debugging option during hibernation
> under the conditions described, the warning messages generated are "Write
> error to swap device". Under these conditions the screen goes blank and
> system just waits echoing this error and doesn't poweroff.
> 
> Hibernation works perfectly otherwise as long as the total RAM usage is <<
> 50%.
> 
> So after fudging around with this and adopting Zram for VM compression I
> ended up realising that the system is indeed using 1/2 the RAM for snapshot
> creation. This I feel is not a good implementation. What if there is a
> system which uses hibernation when RAM usage is at 80% or 90%? It just
> wouldn't work. The feature must be solid and reliable. I personally am
> fairly heavy user of redundant RAM as tmpfs. The system runs continuously
> and following an interruption the system must resume where it left off.
> 
> One method I suggest is to create a meta-snapshot in a small amount of
> reserved RAM or initial swap area, of the image consisting of relevant pages
> at the point of system freeze. Then writing only those pages identified in
> the meta-snapshot directly to swap thereafter.
This is similar to what linux does in current implementation, after the snapshot has been created finally, the extra pages which are not used will be free. So when the system goes to write swap device stage, all the extra pages in previous pre-allocation (50% of the memory) have been released already. That is to say, although there might be large number of pages preallocated for snapshot creation, after the snapshot has been finally generated, some of the preallocated-pages will be returned to buddysystem.

Not sure why 'Write error to swap device' is printed, is this the full log ? 
I can not find this error message in my code :( is there any error return number?
Comment 5 doa379 2016-10-11 21:26:06 UTC
I'm sorry I was not clear in my post. In my post I just reiterated what was said earlier. Let me clarify.

By creating a meta-snapshot what I'm proposing is to create an index of the relevant page addresses only. Then committing only those pages in the index sequentially to swap. This eliminates the question of committing additional pages when the system actually writes to swap. 

We definitely can't have the current situation whereby it's only possible to suspend-to-disk half the total amount of RAM, even in principle. Not only is it a short idea but also lacks quality. Another question might arise as to how to read that memory map back into RAM when resuming.

I'm trying to find the precise error message in the source but I can't find it either. It's not a big deal as the error was quite likely a generic error string. The error messages in the Power modules aren't precise enough to pinpoint the exact bug arising out of the code. I had to work backwards up to this point through a very time-consuming process of elimination. I think we need to look again closely at the actual model in the code used to suspend-to-disk.
Comment 6 Rafael J. Wysocki 2016-10-11 21:42:02 UTC
I can't reproduce this problem, so I think that your configuration is somehow special.

However, from your description it looks like this patch may help:

https://patchwork.kernel.org/patch/6726941/

Can you please try it?
Comment 7 doa379 2016-10-12 23:20:01 UTC
I looked at this patch. I applied it to Kernel 4.8. But unfortunately there was no apparent change in behaviour. The same symptoms arose as before.

After issuing the suspend-to-disk command (s2disk say) the console display immediately reports "Snapshotting system.." then all devices go off but the system doesn't resume with the message "Saving image pages to swap.. %" like it normally should. Instead the system remains in a stalled state with power-on to the main unit but all devices off and no data written to swap.

I should point out that this affliction more commonly arises if RAM consumption is above 35-50% and when suspending-to-disk for the first time after booting (namely with a fresh empty swap). So as a corollary, this is a buffer/cache issue because if the buffers are offset to swap then there is less of a demand for free RAM. In an earlier experiment to test this hypothesis I set the swappiness factor to 100. But that didn't help either as the system offsets data to swap when RAM consumption is around the 85-90% mark. Almost all testing has been at the 60-70% ballpark. Still I'm not convinced if it's the correct approach to squeeze out every amount of free memory by attempting to free up buffers and caches.
Comment 8 Hussam Al-Tayeb 2016-10-13 08:09:37 UTC
doa379, what graphics device do you use?
I only have this issue when using nouveau. It's not an issue when using the proprietary NVIDIA driver. It seem that on open source graphics drivers, the kernel pushes a copy of the used dedicated video memory into physical ram (instead of swap) on attempting to suspend2disk. When it fails to do so due to high RAM usage, it fails to hibernate.
We could of course be facing different issues.
Comment 9 doa379 2016-10-13 17:13:48 UTC
Video is integrated into Intel chip. But the same issue occurs on three different hardware brands.
Comment 10 Rainer Fiebig 2017-02-26 18:16:09 UTC
(In reply to doa379 from comment #7)
snip
> 
> I should point out that this affliction more commonly arises if RAM
> consumption is above 35-50% and when suspending-to-disk for the first time
> after booting (namely with a fresh empty swap). So as a corollary, this is a
snip

This implies that s2disk fails when memory-usage is < 50% which does not really support your hypothesis.

So try to rule out other possibilities for failure, for instance this one: https://bugzilla.kernel.org/show_bug.cgi?id=97201.

At least high memory load per se does not prevent s2disk from succeeding on my system:

> free
             total       used       free     shared    buffers     cached
Mem:      11950352   11107920     842432     399252    1859192    1794676
-/+ buffers/cache:    7454052    4496300
Swap:     10713084     248084   10465000


> dmesg -t
...
PM: Preallocating image memory... 
nr_mapped = 680612 pages, 2658 MB
active_inactive(file) = 738036 pages, 2882 MB
nr_sreclaimable = 44240 pages, 172 MB
active_inactive(anon) = 1444073 pages, 5640 MB
nr_shmem = 132118 pages, 516 MB
save_highmem = 0 pages, 0 MB
saveable = 2904297 pages, 11344 MB
highmem = 0 pages, 0 MB
additional_pages = 220 pages, 0 MB
avail_normal = 3061329 pages, 11958 MB
count = 3023173 pages, 11809 MB
max_size = 1510452 pages, 5900 MB
user_specified_image_size = 1344414 pages, 5251 MB
adjusted_image_size = 1344415 pages, 5251 MB
minimum_pages = 1358560 pages, 5306 MB
target_image_size = 1358560 pages, 5306 MB
preallocated_high_mem = 0 pages, 0 MB
to_alloc = 1512721 pages, 5909 MB
to_alloc_adjusted = 1512721 pages, 5909 MB
pages_allocated = 1512721 pages, 5909 MB
done (allocated 1365648 pages)
PM: Allocated 5462592 kbytes in 17.21 seconds (317.40 MB/s)
...
PM: Need to copy 1353731 pages
PM: Hibernation image created (1353731 pages copied)
Comment 11 Hussam Al-Tayeb 2017-02-26 18:39:40 UTC
I upgraded to a new machine with 8GB ram but moved the old graphics card to it (2GB video memory dedicated).
Now as long as I don't exceed 6GB of ram usage, I can suspend to disk. Anything higher results in failures. This is using nouveau. Proprietary nvidia driver doesn't suffer from this because it seems (my own uneducated observation) to leak the video memory into user space applications on suspend to disk.
Comment 12 Rainer Fiebig 2017-02-26 20:41:21 UTC
(In reply to Hussam Al-Tayeb from comment #11)
> I upgraded to a new machine with 8GB ram but moved the old graphics card to
> it (2GB video memory dedicated).
> Now as long as I don't exceed 6GB of ram usage, I can suspend to disk.

Now we have a 75%-limit. ;)

> Anything higher results in failures. This is using nouveau. Proprietary
> nvidia driver doesn't suffer from this because it seems (my own uneducated
> observation) to leak the video memory into user space applications on
> suspend to disk.

Use the proprietary driver then, if it gives you better results.

The OP's system has integrated graphic (same here), so that's probably not the issue.

But if he uses VirtualBox-VMs, NR_FILE_MAPPED will be high which can cause s2disk to fail unnecessarily, even if memory-usage is significantly < 50%.
Comment 13 Hussam Al-Tayeb 2017-02-26 20:47:19 UTC
Are you sure the issue is 50% or 75%? it could simply be hiccuping when remainder of unused ram is less than allocated video memory.
Comment 14 Rainer Fiebig 2017-02-26 21:20:43 UTC
(In reply to Hussam Al-Tayeb from comment #13)
> Are you sure the issue is 50% or 75%? it could simply be hiccuping when
> remainder of unused ram is less than allocated video memory.

I really don't know whether a 50%-, 75%- or whatever%-limit does exist. If you take a close look at the example in comment 10 you see that almost all memory was used up and s2disk still succeeded. Here's another example: https://bugzilla.kernel.org/show_bug.cgi?id=47931#c47

But from comment 7 I learned that the OP at times encounters failures even if memory-usage is below 50%. And in that case a 50%-limit can not be the reason. So it's better to also consider other possibilities.
Comment 15 Manuel Krause 2017-06-25 11:15:31 UTC
@doa379:
You've recently marked this bug RESOLVED + CODE_FIX.
I now wonder which code fix you're referring to. Was it the patch from comment 6, some in-kernel change or something else? 
Thank you in advance for clarifying!
Comment 16 Manuel Krause 2017-06-25 12:13:17 UTC
@Jay/ Rainer Fiebig:
A not really off-topic question regarding your previous work: Do you still have both of the two fixes in use?

* the older one from BUG 47931 (2014): https://bugzilla.kernel.org/attachment.cgi?id=155521

* the newer one from BUG 97201: https://bugzilla.kernel.org/attachment.cgi?id=255931

Or only the newer or additional other ones?

Best regards
Comment 17 doa379 2017-06-25 13:54:11 UTC
Don't you read the entire thread or do you just read the last message?

See Comment 12. I have reason to believe this is fixed. I have tested Kernel 4.9 extensively and have not come across this issue again. Second, Kernels 4.7, 4.8 are obsolete (EOL).

So now Closed.
Comment 18 Manuel Krause 2017-06-25 15:22:32 UTC
(In reply to doa379 from comment #17)
> Don't you read the entire thread or do you just read the last message?
> 
> See Comment 12. I have reason to believe this is fixed. I have tested Kernel
> 4.9 extensively and have not come across this issue again. Second, Kernels
> 4.7, 4.8 are obsolete (EOL).
> 
> So now Closed.

Of course, I've read the entire thread (would be worthless withouot doing so). But I haven't found any statement about how and where and when it got fixed (and not even in comment 12). That's why I wanted your added info. The EOL of the mentioned kernels doesn't matter for the issue itself.

So you'd say kernel 4.9 works well in the regard of the topic? What kernel do you use atm? Already tried 4.11?
Comment 19 doa379 2017-06-25 15:30:37 UTC
4.9 is good. Earlier versions of 4.11 also ok. Not sure about 4.11.5, 4.11.6 -- still testing.
Comment 20 Manuel Krause 2017-06-25 16:56:27 UTC
It's quite time consuming to collect "all" relevant info for this topic from bugzilla, spread over multiple reports, several closed unfortunately, meaning invisible.
In my case the firefox with many open tabs is a memory hog and I additionally often make use of /dev/shm as ramdisk. 8G RAM, 3G shm, 13G swap, integrated Intel GFX. The longer the uptime and the more hibernations, the more likely crashes get and at least segfaults in random processes.
Also with 4.11.7 the resume from hibernation takes ages to get back a responsive (KDE) desktop with a responsive firefox.
I wasn't lucky to find the current TuxOnIce code reliable enough (max. 3 times til failure) :-( -- and am now testing both fixes from JRF (comment 16) altogether at kernel 4.11.7.
First 3 repetitive hibernations with different memory loads are promising, btw.
Comment 21 Manuel Krause 2017-06-25 17:09:35 UTC
(In reply to Rafael J. Wysocki from comment #6)
> I can't reproduce this problem, so I think that your configuration is
> somehow special.
> 
> However, from your description it looks like this patch may help:
> 
> https://patchwork.kernel.org/patch/6726941/
> 
> Can you please try it?

Even given that this bug is closed now and the advertised patch looks quite old, would you still like it to be tested on a more recent kernel like 4.11.7?

Note You need to log in before you can comment on or make changes to this bug.