Bug 30482

Summary: try harder to free enough memory / improve image size autotuning
Product: Power Management Reporter: Martin Steigerwald (Martin)
Component: Hibernation/SuspendAssignee: power-management_other
Status: CLOSED DOCUMENTED    
Severity: normal CC: florian, lenb, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216    
Attachments: PM / Hibernate: Refine autoestimation of image size

Description Martin Steigerwald 2011-03-05 14:38:52 UTC
Subject    : does hibernate to disk try hard enough to free memory?
Submitter  : Martin Steigerwald <Martin@lichtvoll.de>
Date       : 2011-02-22 20:59:52
Message-ID : 201102222159.58546.Martin@lichtvoll.de
References : http://marc.info/?l=linux-kernel&m=129840842503720&w=2

Am Tuesday 22 February 2011 schrieb Martin Steigerwald:
> Hi!
> 
> Since Radeon KMS I often have it that my ThinkPad T42 with 2 MiB of RAM
> is not able to allocate memory for the hibernation image. Before KMS
> hibernation only very rarely failed for that reason.
> 
> Often I run without compositing at all as I believe this might spare
> some pages as well. But this doesn't always help.
> 
> It complains that to less pages could be freed. For example with kernel
> 2.6.37:
> 
> Feb 16 00:15:00 shambhala kernel: PM: Creating hibernation image:
> Feb 16 00:15:00 shambhala kernel: PM: Need to copy 186577 pages
> Feb 16 00:15:00 shambhala kernel: PM: Normal pages needed: 114411 +
> 1024, available pages: 112767
> Feb 16 00:15:00 shambhala kernel: PM: Not enough free memory
> Feb 16 00:15:00 shambhala kernel: PM: Error -12 creating hibernation
> image Feb 16 00:15:00 shambhala kernel: Extended CMOS year: 2000
> Feb 16 00:15:00 shambhala kernel: ACPI: Waking up from system sleep
> state S4
> Feb 16 00:15:00 shambhala kernel: PM: early recover of devices complete
> after 0.376 msecs
> 
> This was with:
> 
> martin@shambhala:~> cat /proc/version
> Linux version 2.6.37-tp42-rtime-00004-g9eb63ce (martin@shambhala) (gcc
> version 4.4.5 (Debian 4.4.5-8) ) #1 PREEMPT Thu Jan 13 10:59:19 CET
> 2011
> 
> (vanilla with some recursive mtime patches from Jan Kara, but I had
> this with unpatched vanilla 2.6.37/ 2.6.36 and 2.6.33 as well)
> 
> Often it is just a free thousand pages that are missing.
> 
> With TuxOnIce - my current kernels are not having TuxOnIce compiled in
> - there was a specific mention of to less lowmem pages.
> 
> A usual situation where it might fail is:
> 
>   shambhala:~> cat /proc/meminfo
>   MemTotal:        2073668 kB
>   MemFree:          257536 kB
>   Buffers:           26480 kB
>   Cached:           916328 kB
>   SwapCached:        27044 kB
>   Active:           795012 kB
>   Inactive:         660528 kB
>   Active(anon):     289056 kB
>   Inactive(anon):   230280 kB
>   Active(file):     505956 kB
>   Inactive(file):   430248 kB
>   Unevictable:           0 kB
>   Mlocked:               0 kB
>   HighTotal:       1187144 kB
>   HighFree:         158040 kB
>   LowTotal:         886524 kB
>   LowFree:           99496 kB
>   SwapTotal:       4000180 kB
>   SwapFree:        3797720 kB
>   Dirty:             10064 kB
>   Writeback:             0 kB
>   AnonPages:        496044 kB
>   Mapped:            64192 kB
>   Shmem:              6604 kB
>   Slab:              85540 kB
>   SReclaimable:      34572 kB
>   SUnreclaim:        50968 kB
>   KernelStack:        2776 kB
>   PageTables:         7412 kB
>   NFS_Unstable:          0 kB
>   Bounce:                0 kB
>   WritebackTmp:          0 kB
>   CommitLimit:     5037012 kB
>   Committed_AS:    1708672 kB
>   VmallocTotal:     122880 kB
>   VmallocUsed:       17320 kB
>   VmallocChunk:      92856 kB
>   DirectMap4k:      897016 kB
>   DirectMap4M:       12288 kB
> 
> 
> When I stop Akonadi (with SQlite, not MySQL as backend) - after having
> closed all other applications of my KDE 4 desktop - hibernation usually
> works.
> 
> But I also found that
> 
> echo 3 >/proc/sys/vm/drop_caches helps.
> 
> echo 1 frees some pages:
> 
>   shambhala:~> echo 1 > /proc/sys/vm/drop_caches
>   shambhala:~> cat /proc/meminfo
>   MemTotal:        2073668 kB
>   MemFree:         1135132 kB
>   Buffers:             512 kB
>   Cached:            71544 kB
>   SwapCached:        27044 kB
>   Active:           339008 kB
>   Inactive:         245784 kB
>   Active(anon):     289060 kB
>   Inactive(anon):   230280 kB
>   Active(file):      49948 kB
>   Inactive(file):    15504 kB
>   Unevictable:           0 kB
>   Mlocked:               0 kB
>   HighTotal:       1187144 kB
>   HighFree:         697688 kB
>   LowTotal:         886524 kB
>   LowFree:          437444 kB
>   SwapTotal:       4000180 kB
>   SwapFree:        3797720 kB
>   Dirty:               444 kB
>   Writeback:             0 kB
>   AnonPages:        496048 kB
>   Mapped:            64228 kB
>   Shmem:              6604 kB
>   Slab:              78844 kB
>   SReclaimable:      27888 kB
>   SUnreclaim:        50956 kB
>   KernelStack:        2776 kB
>   PageTables:         7412 kB
>   NFS_Unstable:          0 kB
>   Bounce:                0 kB
>   WritebackTmp:          0 kB
>   CommitLimit:     5037012 kB
>   Committed_AS:    1708672 kB
>   VmallocTotal:     122880 kB
>   VmallocUsed:       17320 kB
>   VmallocChunk:      92856 kB
>   DirectMap4k:      897016 kB
>   DirectMap4M:       12288 kB
> 
> echo 2 even more:
> 
>   shambhala:~> echo 2 > /proc/sys/vm/drop_caches
>   shambhala:~> cat /proc/meminfo
>   MemTotal:        2073668 kB
>   MemFree:         1209228 kB
>   Buffers:            1204 kB
>   Cached:            71536 kB
>   SwapCached:        27044 kB
>   Active:           339024 kB
>   Inactive:         246456 kB
>   Active(anon):     289064 kB
>   Inactive(anon):   230280 kB
>   Active(file):      49960 kB
>   Inactive(file):    16176 kB
>   Unevictable:           0 kB
>   Mlocked:               0 kB
>   HighTotal:       1187144 kB
>   HighFree:         697688 kB
>   LowTotal:         886524 kB
>   LowFree:          511540 kB
>   SwapTotal:       4000180 kB
>   SwapFree:        3797720 kB
>   Dirty:               380 kB
>   Writeback:             0 kB
>   AnonPages:        496052 kB
>   Mapped:            64228 kB
>   Shmem:              6604 kB
>   Slab:              68036 kB
>   SReclaimable:      17088 kB
>   SUnreclaim:        50948 kB
>   KernelStack:        2776 kB
>   PageTables:         7412 kB
>   NFS_Unstable:          0 kB
>   Bounce:                0 kB
>   WritebackTmp:          0 kB
>   CommitLimit:     5037012 kB
>   Committed_AS:    1708672 kB
>   VmallocTotal:     122880 kB
>   VmallocUsed:       17320 kB
>   VmallocChunk:      92856 kB
>   DirectMap4k:      897016 kB
>   DirectMap4M:       12288 kB
> 
> Thus the machine has 500 MB of free lowmem and over 1 GB of free
> highmem of 2 GB of total mem and thus surely should be able to
> hibernate with ease IMHO. Which it does then too.
> 
> After an echo 3 onto drop_caches hibernation for now *always* worked,
> even when I left Akregator running - when often with just one open
> application the machine would not hibernate.
> 
> So if dropping caches *and* dir entries manually helps hibernation
> going why doesn't try PM that hard to free pages in order to let the
> allocation of the hibernation image be successful?
> 
> Are there any other tips? The machine has a whopping 900 MB of caches
> and there is a lot of free memory usually so actually I do not get why
> there are any memory issues at all that prevent successful
> hibernation. I guess that using a 64 bit machine with more memory and
> without that hysteric differentiation between lowmem and highmem pages
> that I never got coming from the Amiga will help out all those issues,
> but still I'd like to know why its failing with 2 GB of RAM of which
> 1.5 GB are basically fire - unless I drop caches and dir entries
> manually.
> 
> I will now continue use
> 
> echo 3 > /proc/sys/vm/drop_caches
> 
> prior to hibernation to check whether it really fixed memory issues
> during hibernation.

Raphael asks:

> What's the value in /sys/power/image_size?

I answer:

> shambhala:~> cat /sys/power/image_size
> 844206080
> 
> Should I try with less?

Raphael instructs me too try with less and try to report what minimum value works in all cases. I tried with:

750000000 - too much:

  Feb 24 23:15:14 shambhala kernel: PM: Marking nosave pages: 000000000009f000 - 0000000000100000
  Feb 24 23:15:14 shambhala kernel: PM: Basic memory bitmaps created
  Feb 24 23:15:14 shambhala kernel: PM: Syncing filesystems ... done.
  Feb 24 23:15:34 shambhala kernel: Freezing user space processes ... (elapsed 0.06 seconds) done.
  Feb 24 23:15:34 shambhala kernel: Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
  Feb 24 23:15:34 shambhala kernel: PM: Preallocating image memory... done (allocated 327472 pages)
  Feb 24 23:15:34 shambhala kernel: PM: Allocated 1309888 kbytes in 10.96 seconds (119.51 MB/s)
  [...]
  Feb 24 23:15:34 shambhala kernel: PM: freeze of devices complete after 514.508 msecs
  Feb 24 23:15:34 shambhala kernel: PM: late freeze of devices complete after 0.556 msecs
  Feb 24 23:15:34 shambhala kernel: ACPI: Preparing to enter system sleep state S4
  Feb 24 23:15:34 shambhala kernel: PM: Saving platform NVS memory
  Feb 24 23:15:34 shambhala kernel: Extended CMOS year: 2000
  Feb 24 23:15:34 shambhala kernel: PM: Creating hibernation image:
  Feb 24 23:15:34 shambhala kernel: PM: Need to copy 198586 pages
  Feb 24 23:15:34 shambhala kernel: PM: Normal pages needed: 115478 + 1024, available pages: 111700
  Feb 24 23:15:34 shambhala kernel: PM: Not enough free memory
  Feb 24 23:15:34 shambhala kernel: PM: Error -12 creating hibernation image
  
  
With Dolphin, KMail, Kontact *and* Akregator, but still quite some free RAM:
  
  martin@shambhala:~> free -m
               total       used       free     shared    buffers     cached
  Mem:          2025       1107        917          0         51        130
  -/+ buffers/cache:        924       1100
  Swap:         3906       1318       2588


730000000 too much:

  Mar  3 23:40:42 shambhala kernel: PM: freeze of devices complete after 564.161 msecs
  Mar  3 23:40:42 shambhala kernel: PM: late freeze of devices complete after 0.466 msecs
  Mar  3 23:40:42 shambhala kernel: ACPI: Preparing to enter system sleep state S4
  Mar  3 23:40:42 shambhala kernel: PM: Saving platform NVS memory
  Mar  3 23:40:42 shambhala kernel: Extended CMOS year: 2000
  Mar  3 23:40:42 shambhala kernel: PM: Creating hibernation image:
  Mar  3 23:40:42 shambhala kernel: PM: Need to copy 187204 pages
  Mar  3 23:40:42 shambhala kernel: PM: Normal pages needed: 128270 + 1024, available pages: 98908
  Mar  3 23:40:42 shambhala kernel: PM: Not enough free memory
  Mar  3 23:40:42 shambhala kernel: PM: Error -12 creating hibernation image

Thus 710000000. So far so fine. Lets see how it goes.

This all was with 

martin@shambhala:~/Linux/Kernel/Mainline/Bugs/hibernate-imagesize> cat proc-version.txt 
Linux version 2.6.37-tp42-rtime-00004-g9eb63ce (martin@shambhala) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 PREEMPT Thu Jan 13 10:59:19 CET 2011

Now I continue testing with 2.6.38-rc7-tp42-00142-g212e349 (same compiler).

I test with Debian Wheezy/Sid/Experimental on my ThinkPad T42 with 2 GB of RAM and 4 GB of Swap.
Comment 1 Martin Steigerwald 2011-03-05 14:51:14 UTC
Reported the hang as:

Bug #30492 - kernel sometimes hangs on hibernation

BTW this all is with in kernel suspend via hibernate script. I am using the following wrapper script for some of my own stuff:

shambhala:/etc> cat acpi/hibernate-extra.sh 
#!/bin/sh

# Zur Sicherheit gleich am Anfang alle ausstehenden Änderungen schreiben
sync

# Versuchen, möglichst viele LowMem Pages freizubekommen
# Dir Entries legt Ext4 offenbar auch ins LowMem
# Und mit zu wenig LowMem Pages klappt der Tiefschlaf mit
# Radeon DRM KMS nicht.
#echo 3 > /proc/sys/vm/drop_caches

# Alternativ kleineres Image bauen, siehe LKML:
# Re: does hibernate to disk try hard enough to free memory? (23.2.2011)
echo 710000000 > /sys/power/image_size

# Network Manager schlafen legen
# siehe /usr/lib/pm-utils/sleep.d/55NetworkManager
dbus-send --print-reply --system                        \
        --dest=org.freedesktop.NetworkManager \
        /org/freedesktop/NetworkManager       \
        org.freedesktop.NetworkManager.sleep

# ifplugd stoppen
#/etc/init.d/ifplugd stop
#ifdown eth0

# Systemzeit in Hardware-Uhr speichern
/etc/init.d/hwclock.sh stop

# Uptimed stoppen, damit er die Rekorde schreibt
/etc/init.d/uptimed stop

# Zur Sicherheit hier nochmal alle ausstehenden Änderungen schreiben
sync

# Gutnacht
# /etc/acpi/hibernate.sh
#echo 1 > /sys/power/tuxonice/do_hibernate
#pm-suspend-hybrid
#pm-hibernate
hibernate-disk

# Uptimed wieder starten. Dabei schreibt er erneut die Rekorde
/etc/init.d/uptimed start

# Rekorde gleich schreiben
sync

# Festplatten-Parameter wieder setzen
/etc/init.d/hdparm start

# Systemzeit anhand Hardware-Uhr wieder setzen
/etc/init.d/hwclock.sh start

# Network Manager aufwecken
dbus-send --print-reply --system                        \
        --dest=org.freedesktop.NetworkManager \
        /org/freedesktop/NetworkManager       \
        org.freedesktop.NetworkManager.wake

# ifplugd starten
#/etc/init.d/ifplugd start
Comment 2 Rafael J. Wysocki 2011-03-05 21:24:39 UTC
Created attachment 50132 [details]
PM / Hibernate: Refine autoestimation of image size

Please check if this patch helps.
Comment 3 Martin Steigerwald 2011-03-06 19:12:57 UTC
Trying with patch now. Gives

martin@shambhala:~> cat /sys/power/image_size·
703229952

instead of:

> shambhala:~> cat /sys/power/image_size
> 844206080

Well one third instead of two fifths ;).
Comment 4 Martin Steigerwald 2011-03-14 23:03:22 UTC
Well, seems to work now. I hibernated even with Iceweasel as well as KMail and Kontact open. Will do some more ridicolous attempts but until any issues show up, this new default seems to do fine here. Thanks.
Comment 5 Florian Mickler 2011-03-28 23:18:27 UTC
A patch referencing this bug report has been merged in v2.6.38-8876-g036a982:

commit bea3864fb627d110933cfb8babe048b63c4fc76e
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Tue Mar 15 00:45:46 2011 +0100

    PM / Hibernate: Reduce autotuned default image size
Comment 6 Martin Steigerwald 2011-03-29 09:13:29 UTC
I have been testing with the patch for quite a while. It is still possible to provoke that a hibernation attempt fails and has to be retried, but it has become more unlikely. Is there a way for a more fundamental fix? I have the following ideas:

1) Can it be related to lowmem pages? Then a patch that explicitely show lowmem / highmem could help in diagnosing this. I might even be able to write such a patch, considering that its easy to get that info (I do not know kernel programming).

2) It seems to be related to driver allocations. And since I have this issue since switching my Radeon gfx setup to KMS it likely is the KMS / DRM driver in the kernel causing bigger allocations than before where hibernation even worked with two KDE 4 sessions at times at least. Does it make sense to open a bug report and ask whether the radeon KMS / DRM driver could free some pages upon hibernation? Or does it just allocate pages during that time that are absolutely needed?

Any advice on how to proceed further?
Comment 7 Rafael J. Wysocki 2011-04-23 19:19:06 UTC
The problem is a consequence of bugs in device drivers that shouldn't allocate
memory in their suspend/resume routines _at_ _all_.

So, a more fundamental fix would be to modify drivers so that they use
suspend/hibernate notifiers for allocating memory.  IOW, you should complain
to the developers of the drivers that cause problems to happen.
Comment 8 Martin Steigerwald 2011-04-29 15:45:30 UTC
Is there a way for me to determine which drivers are involved?

Well since I have this since switching to radeon KMS it might probably just the radeon drm kms driver.
Comment 9 Martin Steigerwald 2011-04-29 16:07:28 UTC
Well I reported it as:

Bug #34102 -  radeon drm/kms: please use suspend/hibernate notifiers for allocating memory in suspend routines
Comment 10 Florian Mickler 2011-05-30 08:17:02 UTC
A patch referencing a commit referencing this bug report has been merged in v3.0-rc1:

commit 1c1be3a949a61427a962771c85a347c822aeb991
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Sun May 15 11:39:48 2011 +0200

    Revert "PM / Hibernate: Reduce autotuned default image size"