If NR_FILE_MAPPED is high (e.g. when using VirtualBox), s2disk/s2both may fail. The characteristics are similar to Bug 47931 and have been discussed there at length (see Comment 50 and following). For a possible starting point to a solution see my Comment 72.
Just for the record: My solution (i.e. excluding NR_FILE_MAPPED from the calculation of "size" in minimum_image_size()) has been working flawlessly for many months now, even in high-load cases like the following where Mapped was almost 4 GB: > free total used free shared buffers cached Mem: 11997692 11272020 725672 317572 1404316 1673400 -/+ buffers/cache: 8194304 3803388 Swap: 10713084 780376 9932708 The impeccable reliability that the modification brought to s2disk (and the problems without it) is proof enough for me that the current way of calculating minimum_image_size is wrong - or at least suboptimal: static unsigned long minimum_image_size(unsigned long saveable) { unsigned long size; size = global_page_state(NR_SLAB_RECLAIMABLE) + global_page_state(NR_ACTIVE_ANON) + global_page_state(NR_INACTIVE_ANON) + global_page_state(NR_ACTIVE_FILE) + global_page_state(NR_INACTIVE_FILE) - global_page_state(NR_FILE_MAPPED); return saveable <= size ? 0 : saveable - size; } According to the function-comment, size is the "number of pages that can be freed in theory". So the reasoning for subtracting NR_FILE_MAPPED seems to be that there is a certain amount of non-freeable pages in the other summands that must be weeded out this way. Otherwise subtracting NR_FILE_MAPPED wouldn't make sense. As long as NR_FILE_MAPPED is relatively small, subtracting it doesn't have much impact anyway. But it becomes a problem and leads to unnecessary failure if NR_FILE_MAPPED is high - like when using VirtualBox. Is the bug relevant? Let's assume there's 1 million people using Linux as their desktop-OS. 10 % use s2disk and of those 10 % use an app that causes a high NR_FILE_MAPPED. That would be 10000 users having a suboptimal experience with s2disk in Linux. Jay
Over a year now of reliable s2disk/s2both for me. The following values illustrate the problem with the original function. They were taken shortly after resuming from hibernation. It was just a normal use-case, only a few apps open plus a 2-GB-VirtualBox-VM. But as you can see "size" would be negative and s2disk would probably fail: kb Pages nr_slab_reclaimable 68.368 17.092 nr_active_anon 904.904 226.226 nr_inactive_anon 436.112 109.028 nr_active_file 351.320 87.830 nr_inactive_file 163.340 40.835 freeable: 1.924.044 481.011 nr_mapped 2.724.140 681.035 freeable – mapped: -800.096 -200.024 -------------------------------------------------------- ~> cat /proc/meminfo MemTotal: 11998028 kB MemFree: 7592344 kB MemAvailable: 7972260 kB Buffers: 229960 kB Cached: 730140 kB SwapCached: 133868 kB Active: 1256224 kB Inactive: 599452 kB Active(anon): 904904 kB Inactive(anon): 436112 kB Active(file): 351320 kB Inactive(file): 163340 kB Unevictable: 60 kB Mlocked: 60 kB SwapTotal: 10713084 kB SwapFree: 9850232 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 847876 kB Mapped: 2724140 kB Shmem: 445440 kB Slab: 129984 kB SReclaimable: 68368 kB SUnreclaim: 61616 kB KernelStack: 8128 kB PageTables: 53692 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 16712096 kB Committed_AS: 6735376 kB VmallocTotal: 34359738367 kB VmallocUsed: 578084 kB VmallocChunk: 34359117432 kB HardwareCorrupted: 0 kB AnonHugePages: 276480 kB HugePages_Total: 0 HugePages_Free: 0
Hi Jay, I looked at your proposal at Bug 47931, if I understand correctly, you suggested removing the NR_FILE_MAPPED in minimum_image_size, and in order to get rid of page cache, shrink_all_memory(saveable) try to reclaim as much as possible page cache? shrink_all_memory might not touch dirty page caches and it only asks flusher thread to write these datas to disk, so after the shrink_all_memory finished, there might be still many NR_FILE_MAPPED in the system, so I think we should firstly invoke sys_sync IMO. Yu
(In reply to Chen Yu from comment #3) > Hi Jay, > I looked at your proposal at Bug 47931, if I understand correctly, you > suggested removing the NR_FILE_MAPPED in minimum_image_size, and in order to > get rid of page cache, shrink_all_memory(saveable) try to reclaim as much as > possible page cache? > shrink_all_memory might not touch dirty page caches and it only asks flusher > thread to write these datas to disk, so after the shrink_all_memory > finished, there might be still many NR_FILE_MAPPED in the system, so I > think we should firstly invoke sys_sync IMO. > > Yu Hi and thanks for taking a look! I found out that excluding NR_FILE_MAPPED in minimum_image_size() is all that's necessary for fixing this bug. So I left shrink_all_memory(saveable - size) in hibernate_preallocate_memory() unchanged. In essence, my reasoning for removing NR_FILE_MAPPED from minimum_image_size() goes like this: - "saveable" is all that's in memory at that time - subtracting those items from saveable that can be freed, yields the minimum image size - but NR_FILE_MAPPED is part of that fraction of saveable that can *not* be freed (according to what I'm observing here) - so subtracting it from those that can be freed leads to wrong results and unnecessary failure The assumption is of course that NR_SLAB_RECLAIMABLE etc. are truly free-able. If I'm understanding you rightly, you are suggesting to do a sys_sync before shrink_all_memory(saveable - size)? Would that be for added stability or as a way to reduce NR_FILE_MAPPED? Jay
Hi Jay, (In reply to Jay from comment #4) > In essence, my reasoning for removing NR_FILE_MAPPED from > minimum_image_size() goes like this: > - "saveable" is all that's in memory at that time Got > - subtracting those items from saveable that can be freed, yields the > minimum image size Got > - but NR_FILE_MAPPED is part of that fraction of saveable that can *not* be > freed (according to what I'm observing here) Got > - so subtracting it from those that can be freed leads to wrong results and > unnecessary failure Mapped might has 'common set' with Active(file) and Inactive(file), so the original code wants to exclude it from reclaimable. In most cases, Mapped might be a subset of the latters. However, if more than one tasks have mapped the same file, the count of Mapped will grow bigger and bigger. In your case, I guess there are more than one tasks have executed the following code: fd = open("my_file.dat"); addr = mmap(fd, offset, size); memcpy(dst, addr, size); How about the something like this: static unsigned long minimum_image_size(unsigned long saveable) { unsigned long size, nr_map; size = global_page_state(NR_SLAB_RECLAIMABLE) + global_page_state(NR_ACTIVE_ANON) + global_page_state(NR_INACTIVE_ANON) + global_page_state(NR_ACTIVE_FILE) + global_page_state(NR_INACTIVE_FILE); nr_map = global_page_state(NR_FILE_MAPPED); size = size > nr_map ? (size - nr_map) : 0; return saveable <= size ? 0 : saveable - size; }
(In reply to Chen Yu from comment #5) > Hi Jay, > (In reply to Jay from comment #4) > > In essence, my reasoning for removing NR_FILE_MAPPED from > > minimum_image_size() goes like this: > > - "saveable" is all that's in memory at that time > Got > > - subtracting those items from saveable that can be freed, yields the > > minimum image size > Got > > - but NR_FILE_MAPPED is part of that fraction of saveable that can *not* be > > freed (according to what I'm observing here) > Got > > - so subtracting it from those that can be freed leads to wrong results and > > unnecessary failure > Mapped might has 'common set' with Active(file) and Inactive(file), > so the original code wants to exclude it from reclaimable. > In most cases, Mapped might be a subset of the latters. > However, if more than one tasks have mapped the same file, the count of > Mapped will grow bigger and bigger. In your case, I guess there are more > than one tasks have executed the following code: > > fd = open("my_file.dat"); > addr = mmap(fd, offset, size); > memcpy(dst, addr, size); > Thanks, that's really helpful information. Maybe just excluding nr_mapped leads to a too optimistic value for minimum_image_size - although it works nicely here. > > How about the something like this: > > static unsigned long minimum_image_size(unsigned long saveable) > { > unsigned long size, nr_map; > > size = global_page_state(NR_SLAB_RECLAIMABLE) > + global_page_state(NR_ACTIVE_ANON) > + global_page_state(NR_INACTIVE_ANON) > + global_page_state(NR_ACTIVE_FILE) > + global_page_state(NR_INACTIVE_FILE); > nr_map = global_page_state(NR_FILE_MAPPED); > size = size > nr_map ? (size - nr_map) : 0; > > return saveable <= size ? 0 : saveable - size; > } That's a good way to prevent the worst case and it would definitely increase reliability. At least a safeguard like this should be implemented. But I'm not completely happy yet. Because s2disk would still fail in higher-load-cases. Which would feel like a step back for me. Your info led me to think about excluding not only nr_mapped but also nr_active_file and nr_inactive_file. "Size" would always be positive, so no safeguard necessary. A first look at the data shows me that it would work more often than the original + safeguard. But I'm not yet sure. What do you think?
Hi Yu, for reasons mentioned above, I've been thinking about an alternative to your suggestion. Taking your info on nr_mapped etc. into consideration, I found the following one which has the advantage that it also works in higher-load-cases - albeit then with a too optimistic estimate for minimum_image_size. So here it is: static unsigned long minimum_image_size(unsigned long saveable) { unsigned long size, sum_1, sum_2, nr_mapped; nr_mapped = global_page_state(NR_FILE_MAPPED); /* no common set with NR_FILE_MAPPED */ sum_1 = global_page_state(NR_SLAB_RECLAIMABLE) + global_page_state(NR_ACTIVE_ANON) + global_page_state(NR_INACTIVE_ANON); /* possible common set with NR_FILE_MAPPED */ sum_2 = + global_page_state(NR_ACTIVE_FILE) + global_page_state(NR_INACTIVE_FILE); if (nr_mapped > sum_2) /* NR_FILE_MAPPED bigger than common set */ size = sum_1 /* no point in subtracting it */ + sum_2; else size = sum_1 + sum_2 - nr_mapped; return saveable <= size ? 0 : saveable - size; } In case nr_mapped is relatively low, nothing changes - it is subtracted like in the original code. If nr_mapped is high, we would now prefer the "risk" of a too optimistic estimate for minimum_image_size over an unwarranted failure. Like in your suggestion, "size" cannot become negative anymore, so we have a safeguard, too. For VirtualBox there is a close correlation between the size of the VMs and nr_mapped: a 1-GB-VM leads to nr_mapped of 1 GB plus the nr_mapped from other processes, a 2-GB-VM leads to 2 GB + and so on. So the "true" value for nr_mapped would be nr_mapped minus the size of the VirtualBox-VMs. For my system that's always a few hundred thousand kB. Unfortunately, I don't see a way to derive that from the values available. But doing the calculations with this "true" nr_mapped gets a minimum_image_size that is clearly within the system's limit of max_size = 5900 MB - even in high-load-cases. A way out of this predicament could be to calculate minimum_image_size bottom-up, i. e. building the sum of all those items that can *not* be freed. Would that be possible? Here are the test-results of a case with low nr_mapped and a case with high nr_mapped. (minimum_pages = minimum_image_size.) gitk + several other apps: ~> free total used free shared buffers cached Mem: 11998028 8417232 3580796 343756 2212040 1514996 -/+ buffers/cache: 4690196 7307832 Swap: 10713084 0 10713084 ~> cat /proc/meminfo MemTotal: 11998028 kB MemFree: 3579344 kB MemAvailable: 6709252 kB Buffers: 2212048 kB Cached: 1285388 kB SwapCached: 0 kB Active: 5380248 kB Inactive: 2604112 kB Active(anon): 4280708 kB Inactive(anon): 551372 kB Active(file): 1099540 kB Inactive(file): 2052740 kB Unevictable: 32 kB Mlocked: 32 kB SwapTotal: 10713084 kB SwapFree: 10713084 kB Dirty: 152 kB Writeback: 0 kB AnonPages: 4486972 kB Mapped: 418004 kB Shmem: 345160 kB Slab: 270268 kB SReclaimable: 231020 kB SUnreclaim: 39248 kB KernelStack: 6608 kB PageTables: 48780 kB s2both/resume: OK. PM: Preallocating image memory... save_highmem = 0 pages, 0 MB saveable = 2209910 pages, 8632 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061389 pages, 11958 MB count = 3023188 pages, 11809 MB max_size = 1510460 pages, 5900 MB user_specified_image_size = 1349778 pages, 5272 MB adjusted_image_size = 1349779 pages, 5272 MB minimum_pages = 236928 pages, 925 MB target_image_size = 1349779 pages, 5272 MB nr_should_reclaim = 860131 pages, 3359 MB nr_reclaimed = 586655 pages, 2291 MB preallocated_high_mem = 0 pages, 0 MB to_alloc = 1512728 pages, 5909 MB to_alloc_adjusted = 1512728 pages, 5909 MB pages_allocated = 1512728 pages, 5909 MB debug 26: alloc = 160681 debug 27: size = 0 debug 28: pages_highmem = 0 debug 29: alloc -= size; = 160681 debug 30: size = 160681 debug 31: pages_highmem = 0 debug 32: pages = 1673409 debug 33: pagecount before freeing unnecessary p. = 1348960 debug 34: pagecount after freeing unnecessary p. = 1348126 done (allocated 1673409 pages) PM: Allocated 6693636 kbytes in 2.02 seconds (3313.68 MB/s) ------------------------------------------------------------------------- 2-GB-VM + gitk + several other apps: ~> free total used free shared buffers cached Mem: 11998028 9783252 2214776 293556 2169568 1081812 -/+ buffers/cache: 6531872 5466156 Swap: 10713084 588760 10124324 ~> cat /proc/meminfo MemTotal: 11998028 kB MemFree: 2202012 kB MemAvailable: 4475772 kB Buffers: 2174972 kB Cached: 490356 kB SwapCached: 442936 kB Active: 3764520 kB Inactive: 3399284 kB Active(anon): 3571804 kB Inactive(anon): 1226852 kB Active(file): 192716 kB Inactive(file): 2172432 kB Unevictable: 40 kB Mlocked: 40 kB SwapTotal: 10713084 kB SwapFree: 10124620 kB Dirty: 8740 kB Writeback: 0 kB AnonPages: 4095144 kB Mapped: 2551032 kB Shmem: 300128 kB Slab: 210888 kB SReclaimable: 155076 kB SUnreclaim: 55812 kB KernelStack: 7264 kB PageTables: 54148 kB s2both/resume: OK. PM: Preallocating image memory... save_highmem = 0 pages, 0 MB saveable = 2553652 pages, 9975 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061432 pages, 11958 MB count = 3023231 pages, 11809 MB max_size = 1510481 pages, 5900 MB user_specified_image_size = 1349778 pages, 5272 MB adjusted_image_size = 1349779 pages, 5272 MB minimum_pages = 685190 pages, 2676 MB target_image_size = 1349779 pages, 5272 MB nr_should_reclaim = 1203873 pages, 4702 MB nr_reclaimed = 652695 pages, 2549 MB preallocated_high_mem = 0 pages, 0 MB to_alloc = 1512750 pages, 5909 MB to_alloc_adjusted = 1512750 pages, 5909 MB pages_allocated = 1512750 pages, 5909 MB debug 26: alloc = 160702 debug 27: size = 0 debug 28: pages_highmem = 0 debug 29: alloc -= size; = 160702 debug 30: size = 160702 debug 31: pages_highmem = 0 debug 32: pages = 1673452 debug 33: pagecount before freeing unnecessary p. = 1356279 debug 34: pagecount after freeing unnecessary p. = 1343240 done (allocated 1673452 pages) PM: Allocated 6693808 kbytes in 11.95 seconds (560.15 MB/s)
(In reply to Jay from comment #2) > But as you can see "size" would be negative and s2disk would probably fail: > > freeable – mapped: -800.096 -200.024 > BTW, why the s2disk fail if it is negative? since size is a unsigned int,the negative value will become very big, thus minimum_image_size will return 0, thus shrink_all_memory will reclaim as much as possible pages. so?
(In reply to Chen Yu from comment #8) > (In reply to Jay from comment #2) > > But as you can see "size" would be negative and s2disk would probably fail: > > > > freeable – mapped: -800.096 -200.024 > > > BTW, why the s2disk fail if it is negative? since size is a unsigned > int,the negative value will become very big, thus minimum_image_size will > return 0, thus shrink_all_memory will reclaim as much as possible pages. so? True, but also see https://bugzilla.kernel.org/show_bug.cgi?id=47931#c63. In that case s2disk apparently failed although minimum_image_size returned 0. But the main reason for failure is that the value returned by minimum_image_size() is > max_size. And if nr_mapped is high this happens more often than not. Because the function can't handle that situation. The return-value then is way too high. On the other hand: As long as minimum_image_size is <= max_size, s2disk can handle even very high-load cases. It almost seems to me that one might use an arbitrary value for minimum_image_size: As long as it is <= max_size (and as long as the actual memory-load is also <= max_size), s2disk works. Why else would it succeed if the return-value is 0? Take a look at how that value propagates down to pages = preallocate_image_memory(alloc, avail_normal); if (pages < alloc) {...} Once pages < alloc, it's "game over". With respect to success or not, shrink_all_memory() is IMO not so important. The key-issue is the value returned by minimum_image_size(). This should of course be as accurate as possible. But as long as the actual memory-load is <= max_size, almost any value <= max_size seems better than one > max_size. For that reason I can live well with a more optimistic value for minimum_image_size. Also take a look at the test-result with gitk where nr_mapped is low but anon-pages is high: We have a memory-load of 8632 MB but a minimum_image_size of just 925 MB. To me that seems optimistic, too. :) If we want more accuracy, maybe building minimum_image_size from those items that can not be freed, is an approach to explore. So long!
snip > > static unsigned long minimum_image_size(unsigned long saveable) > { > unsigned long size, sum_1, sum_2, nr_mapped; > > nr_mapped = global_page_state(NR_FILE_MAPPED); > > /* no common set with NR_FILE_MAPPED */ > sum_1 = global_page_state(NR_SLAB_RECLAIMABLE) > + global_page_state(NR_ACTIVE_ANON) > + global_page_state(NR_INACTIVE_ANON); > > /* possible common set with NR_FILE_MAPPED */ > sum_2 = + global_page_state(NR_ACTIVE_FILE) > + global_page_state(NR_INACTIVE_FILE); > > if (nr_mapped > sum_2) /* NR_FILE_MAPPED bigger than common set */ > size = sum_1 /* no point in subtracting it */ > + sum_2; > else > size = sum_1 > + sum_2 > - nr_mapped; > > return saveable <= size ? 0 : saveable - size; > } One unnecessary "+". Should of course be /* possible common set with NR_FILE_MAPPED */ > sum_2 = global_page_state(NR_ACTIVE_FILE) > + global_page_state(NR_INACTIVE_FILE); Jay
Turned out my suggestion from Comment 7 is not bullet-proof. There may be cases where nr_mapped is high but Active(file) + Inactive(file) (= sum_2) is even higher and so it behaves just like the original version and fails (see example below). So if we want s2disk to also work reliably when nr_mapped is high, I see two possibilities: - exclude nr_mapped from the calculation - build minimum_image_size from those items that can not be freed For the latter I have constructed a minimum_image_size(), to get an idea of the value returned (see example below). Values over 5900 MB mean failure. -------- ~> free total used free shared buffers cached Mem: 11998028 9927608 2070420 305720 4183136 1542068 -/+ buffers/cache: 4202404 7795624 Swap: 10713084 498536 10214548 ~> cat /proc/meminfo MemTotal: 11998028 kB MemFree: 2053796 kB MemAvailable: 6802532 kB Buffers: 4183152 kB Cached: 959476 kB SwapCached: 417540 kB Active: 1250348 kB Inactive: 4849364 kB Active(anon): 269652 kB Inactive(anon): 1007892 kB Active(file): 980696 kB Inactive(file): 3841472 kB Unevictable: 40 kB Mlocked: 40 kB SwapTotal: 10713084 kB SwapFree: 10214680 kB Dirty: 176 kB Writeback: 0 kB AnonPages: 605396 kB Mapped: 3730084 kB Shmem: 320460 kB Slab: 247008 kB SReclaimable: 179960 kB SUnreclaim: 67048 kB KernelStack: 7792 kB PageTables: 50544 kB PM: Preallocating image memory... save_highmem = 0 pages, 0 MB saveable = 2586720 pages, 10104 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061361 pages, 11958 MB count = 3023160 pages, 11809 MB max_size = 1510446 pages, 5900 MB user_specified_image_size = 1349778 pages, 5272 MB adjusted_image_size = 1349779 pages, 5272 MB minimum_pages (by exluding nr_mapped) = 981340 pages, 3833 MB minimum_pages, by nonfreeable items = 1083551 pages, 4232 MB minimum_pages, if (nr_mapped > sum_2) = 1927305 pages, 7528 MB minimum_pages, orig. version = 1927305 pages, 7528 MB target_image_size = 1349779 pages, 5272 MB nr_should_reclaim = 1236941 pages, 4831 MB nr_reclaimed = 1058877 pages, 4136 MB preallocated_high_mem = 0 pages, 0 MB to_alloc = 1512714 pages, 5909 MB to_alloc_adjusted = 1512714 pages, 5909 MB pages_allocated = 1512714 pages, 5909 MB debug 26: alloc = 160667 debug 27: size = 0 debug 28: pages_highmem = 0 debug 29: alloc -= size; = 160667 debug 30: size = 160667 debug 31: pages_highmem = 0 debug 32: pages = 1673381 debug 33: pagecount before freeing unnecessary p. = 1351076 debug 34: pagecount after freeing unnecessary p. = 1350516 done (allocated 1673381 pages) PM: Allocated 6693524 kbytes in 1.61 seconds (4157.46 MB/s)
(In reply to Jay from comment #11) > Turned out my suggestion from Comment 7 is not bullet-proof. > > There may be cases where nr_mapped is high but Active(file) + Inactive(file) > (= sum_2) is even higher and so it behaves just like the original version > and fails (see example below). > > So if we want s2disk to also work reliably when nr_mapped is high, I see two > possibilities: > - exclude nr_mapped from the calculation > - build minimum_image_size from those items that can not be freed > > For the latter I have constructed a minimum_image_size(), to get an idea of > the value returned (see example below). > Values over 5900 MB mean failure. > > -------- > > ~> free > total used free shared buffers cached > Mem: 11998028 9927608 2070420 305720 4183136 1542068 > -/+ buffers/cache: 4202404 7795624 > Swap: 10713084 498536 10214548 > > ~> cat /proc/meminfo > MemTotal: 11998028 kB > MemFree: 2053796 kB > MemAvailable: 6802532 kB > Buffers: 4183152 kB > Cached: 959476 kB > SwapCached: 417540 kB > Active: 1250348 kB > Inactive: 4849364 kB > Active(anon): 269652 kB > Inactive(anon): 1007892 kB > Active(file): 980696 kB > Inactive(file): 3841472 kB > Unevictable: 40 kB > Mlocked: 40 kB > SwapTotal: 10713084 kB > SwapFree: 10214680 kB > Dirty: 176 kB > Writeback: 0 kB > AnonPages: 605396 kB > Mapped: 3730084 kB > Shmem: 320460 kB > Slab: 247008 kB > SReclaimable: 179960 kB > SUnreclaim: 67048 kB > KernelStack: 7792 kB > PageTables: 50544 kB > > > PM: Preallocating image memory... > save_highmem = 0 pages, 0 MB > saveable = 2586720 pages, 10104 MB > highmem = 0 pages, 0 MB > additional_pages = 220 pages, 0 MB > avail_normal = 3061361 pages, 11958 MB > count = 3023160 pages, 11809 MB > max_size = 1510446 pages, 5900 MB > user_specified_image_size = 1349778 pages, 5272 MB > adjusted_image_size = 1349779 pages, 5272 MB > > minimum_pages (by exluding nr_mapped) = 981340 pages, 3833 MB > minimum_pages, by nonfreeable items = 1083551 pages, 4232 MB > minimum_pages, if (nr_mapped > sum_2) = 1927305 pages, 7528 MB > minimum_pages, orig. version = 1927305 pages, 7528 MB Interesting, I was almost persuaded by your previous solution at #Comment 7, but for the nr_mapped > sum_2 case, why not: if (nr_mapped > sum_2) size = sum_1; since size should be the finally reclaimable page number, and nr-mapped has occupied most of the active/inactive file pages.
> minimum_pages, by nonfreeable items = 1083551 pages, 4232 MB how do you calculate it?
snip > > > > minimum_pages (by exluding nr_mapped) = 981340 pages, 3833 MB > > minimum_pages, by nonfreeable items = 1083551 pages, 4232 MB > > minimum_pages, if (nr_mapped > sum_2) = 1927305 pages, 7528 MB > > minimum_pages, orig. version = 1927305 pages, 7528 MB > > Interesting, I was almost persuaded by your previous solution at #Comment 7, Me too. But then I looked at the facts. And the facts looked back at me with their hard, cold eyes... ;) > but for the nr_mapped > sum_2 case, why not: > if (nr_mapped > sum_2) > size = sum_1; > since size should be the finally reclaimable page number, and nr-mapped has > occupied most of the active/inactive file pages. I think it would fail too often. Like in the following case where "size" then would be only 1.767.476 kB and minimum_image_size 7.226.316 kB: ~> cat /proc/meminfo MemTotal: 11998028 kB MemFree: 3429856 kB MemAvailable: 5346516 kB Buffers: 603656 kB Cached: 1813444 kB SwapCached: 207108 kB Active: 1394516 kB Inactive: 2213028 kB Active(anon): 848996 kB Inactive(anon): 754800 kB Active(file): 545520 kB Inactive(file): 1458228 kB Unevictable: 80 kB Mlocked: 80 kB SwapTotal: 10713084 kB SwapFree: 9886664 kB Dirty: 156 kB Writeback: 0 kB AnonPages: 1097428 kB Mapped: 4993336 kB Shmem: 413352 kB Slab: 243692 kB SReclaimable: 163680 kB SUnreclaim: 80012 kB KernelStack: 8912 kB PageTables: 59596 kB PM: Preallocating image memory... save_highmem = 0 pages, 0 MB saveable = 2248550 pages, 8783 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061493 pages, 11958 MB count = 3023292 pages, 11809 MB max_size = 1510512 pages, 5900 MB user_specified_image_size = 1349778 pages, 5272 MB adjusted_image_size = 1349779 pages, 5272 MB minimum_pages (by exluding nr_mapped) = 1264899 pages, 4941 MB minimum_pages, by nonfreeable items = 1436206 pages, 5610 MB minimum_pages, if (nr_mapped > sum_2) = 1264899 pages, 4941 MB minimum_pages, orig. version = 0 pages, 0 MB target_image_size = 1349779 pages, 5272 MB nr_should_reclaim = 898771 pages, 3510 MB nr_reclaimed = 423743 pages, 1655 MB preallocated_high_mem = 0 pages, 0 MB to_alloc = 1512780 pages, 5909 MB to_alloc_adjusted = 1512780 pages, 5909 MB pages_allocated = 1512780 pages, 5909 MB ...
(In reply to Chen Yu from comment #13) > > minimum_pages, by nonfreeable items = 1083551 pages, 4232 MB > how do you calculate it? I don't know much about this field, so a good deal of guesswork was involved. I hope, not everything is BS. /* For testing only. minimum_image_size built from (presumably) non-freeable items */ static unsigned long minimum_image_size_nonfreeable(void) { unsigned long minsize; minsize = global_page_state(NR_UNEVICTABLE) + global_page_state(NR_MLOCK) + global_page_state(NR_FILE_DIRTY) + global_page_state(NR_WRITEBACK) + global_page_state(NR_FILE_MAPPED) + global_page_state(NR_SHMEM) + global_page_state(NR_SLAB_UNRECLAIMABLE) + global_page_state(NR_KERNEL_STACK) + global_page_state(NR_PAGETABLE); return minsize; } Here's a case with a low value for nr_mapped (no VM loaded): ~> cat /proc/meminfo MemTotal: 11998028 kB MemFree: 4380836 kB MemAvailable: 6734148 kB Buffers: 1319704 kB Cached: 1387932 kB SwapCached: 0 kB Active: 5006612 kB Inactive: 2210540 kB Active(anon): 4511240 kB Inactive(anon): 295600 kB Active(file): 495372 kB Inactive(file): 1914940 kB Unevictable: 32 kB Mlocked: 32 kB SwapTotal: 10713084 kB SwapFree: 10713084 kB Dirty: 252 kB Writeback: 0 kB AnonPages: 4509608 kB Mapped: 360992 kB Shmem: 297316 kB Slab: 235320 kB SReclaimable: 196392 kB SUnreclaim: 38928 kB KernelStack: 8064 kB PageTables: 49000 kB s2both/resume OK. Resume wieder rel. lang (gitk). PM: Preallocating image memory... save_highmem = 0 pages, 0 MB saveable = 2012027 pages, 7859 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061353 pages, 11958 MB count = 3023152 pages, 11809 MB max_size = 1510442 pages, 5900 MB user_specified_image_size = 1349778 pages, 5272 MB adjusted_image_size = 1349779 pages, 5272 MB minimum_pages (by exluding nr_mapped) = 116532 pages, 455 MB minimum_pages, by nonfreeable items = 240289 pages, 938 MB minimum_pages, if (nr_mapped > sum_2) = 224871 pages, 878 MB minimum_pages, orig. version = 224871 pages, 878 MB target_image_size = 1349779 pages, 5272 MB nr_should_reclaim = 662248 pages, 2586 MB nr_reclaimed = 446618 pages, 1744 MB preallocated_high_mem = 0 pages, 0 MB to_alloc = 1512710 pages, 5909 MB to_alloc_adjusted = 1512710 pages, 5909 MB pages_allocated = 1512710 pages, 5909 MB ...
(In reply to Jay from comment #14) > snip > > > > > > minimum_pages (by exluding nr_mapped) = 981340 pages, 3833 MB > > > minimum_pages, by nonfreeable items = 1083551 pages, 4232 MB > > > minimum_pages, if (nr_mapped > sum_2) = 1927305 pages, 7528 MB > > > minimum_pages, orig. version = 1927305 pages, 7528 MB > > > > Interesting, I was almost persuaded by your previous solution at #Comment > 7, > > Me too. > But then I looked at the facts. And the facts looked back at me with their > hard, cold eyes... ;) > > > but for the nr_mapped > sum_2 case, why not: > > if (nr_mapped > sum_2) > > size = sum_1; > > since size should be the finally reclaimable page number, and nr-mapped has > > occupied most of the active/inactive file pages. > > I think it would fail too often. Like in the following case where "size" > then would be only 1.767.476 kB and minimum_image_size 7.226.316 kB: > OK. I'm still curious about why nr_mapped has occupied so many pages, then I checked the code again, it seems that, nr_mapped number will not increase if another task has mmaped the same file (#Comment 5 might be wrong, if task A has mmap the file, task B's mmap the same file with the same offset will not increase nr_mmaped) , so in theory nr_mapped should not exceed the total number of active+inactive, except that some device drivers have remapped the kernel pages, vmalloc for example, but this situation is not common). In order to find out the root cause for this problem, we might need to find out why nr_mapped is so high, plz help to test by: 1. before VM is allocated(when nr_mapped is low), [root@localhost tracing]# pwd /sys/kernel/debug/tracing // track page_add_file_rmap, which will increase nr_mapped [root@localhost tracing]# echo page_add_file_rmap > set_ftrace_filter // enable function tracer [root@localhost tracing]# echo function > current_tracer // enable function callback [root@localhost tracing]# echo 1 > options/func_stack_trace 2. enable your vm(virtual box?), increase the nr_mapped to a high level, then: //save the tracer data(might be very big) [root@localhost tracing]# cat trace > /home/tracer_nr_mapped.log [root@localhost tracing]# echo 0 > options/func_stack_trace [root@localhost tracing]# echo > set_ftrace_filter //stop the tracer [root@localhost tracing]# echo 0 > tracing_on 3. attach your tracer_nr_mapped.log
Created attachment 199581 [details] Tracing-Log
Hi Yu, the log is attached, hope it helps. It would be nice if you could give me a feedback on minimum_image_size_nonfreeable() and tell me which items to delete or add.
(In reply to Jay from comment #18) > Hi Yu, > > the log is attached, hope it helps. > > It would be nice if you could give me a feedback on > minimum_image_size_nonfreeable() and tell me which items to delete or add. thanks for your info. Theoretically, the minimum_image_size should be the number of pages minus the number of reclaimable 'within' savable_pages, why? because the snaphot image only concerns about the savable pages. You can not simply add the unreclaimable together w/o considering the savable pages IMO, besides, NR_FILE_DIRTY and NR_WRITEBACK might be actually reclaimable. With regard to NR_FILE_MAPPED, I checked your log and it shows that most of nr_mapped increased in the following path(yes, it is reading from a mmap(fd)): VirtualBox-3151 [000] ...1 523.775961: page_add_file_rmap <-do_set_pte VirtualBox-3151 [000] ...1 523.775963: <stack trace> => update_curr => page_add_file_rmap => put_prev_entity => page_add_file_rmap => do_set_pte => filemap_map_pages => do_read_fault.isra.61 => handle_mm_fault => get_futex_key => hrtimer_wakeup => __do_page_fault => do_futex => do_page_fault => page_fault however before one page is added to NR_FILE_MAPPED, this page has already been linked to inactive_file, that is to say, Inactive(file) + active(file) is bigger than/equals NR_FILE_MAPPED. I don't know why NR_FILE_MAPPED is so large, I'll send this question to mm-mailist and ask for a help. And which kernel version are you using? how many CPUs does you platform have?
(In reply to Chen Yu from comment #19) > Theoretically, the minimum_image_size should be the number of pages minus > the number of reclaimable 'within' savable_pages, why? because the snaphot > image only concerns about the savable pages. You can not simply add the > unreclaimable together w/o considering the savable pages IMO, besides, > NR_FILE_DIRTY and NR_WRITEBACK might be actually reclaimable. But in low-nr_mapped-cases the values are close to what the original delivers: minimum_pages, by nonfreeable items = 224814 pages, 878 MB minimum_pages, orig. version = 221699 pages, 866 MB > large, I'll send this question to mm-mailist and ask for a help. And which > kernel version are you using? how many CPUs does you platform have? Right now it's kernel 3.18.xx. But the problem was there from at least 3.7 on - whether it was an untouched distribution-kernel or self-compiled. I haven't tried a kernel later than 4. One Intel 64-bit-CPU.
(In reply to Jay from comment #20) > (In reply to Chen Yu from comment #19) > > > Theoretically, the minimum_image_size should be the number of pages minus > > the number of reclaimable 'within' savable_pages, why? because the snaphot > > image only concerns about the savable pages. You can not simply add the > > unreclaimable together w/o considering the savable pages IMO, besides, > > NR_FILE_DIRTY and NR_WRITEBACK might be actually reclaimable. > > But in low-nr_mapped-cases the values are close to what the original > delivers: > > minimum_pages, by nonfreeable items = 224814 pages, 878 MB > minimum_pages, orig. version = 221699 pages, 866 MB > Humm, yes, it is suitable in your case, but I'm not sure if it fits for other users, we need more investigation on this. But before that, could you please have a try on latest 4.4 kernel? Since I can not reproduce your problem on this version, I want to confirm if the calculating of nr_mapped has been fixed already.
(In reply to Chen Yu from comment #21) > (In reply to Jay from comment #20) > > (In reply to Chen Yu from comment #19) > > > > > Theoretically, the minimum_image_size should be the number of pages minus > > > the number of reclaimable 'within' savable_pages, why? because the > snaphot > > > image only concerns about the savable pages. You can not simply add the > > > unreclaimable together w/o considering the savable pages IMO, besides, > > > NR_FILE_DIRTY and NR_WRITEBACK might be actually reclaimable. > > > > But in low-nr_mapped-cases the values are close to what the original > > delivers: > > > > minimum_pages, by nonfreeable items = 224814 pages, 878 MB > > minimum_pages, orig. version = 221699 pages, 866 MB > > > Humm, yes, it is suitable in your case, but I'm not sure if it fits for > other users, we need more investigation on this. But before that, could you > please have a try on latest 4.4 kernel? Since I can not reproduce your > problem on this version, I want to confirm if the calculating of nr_mapped > has been fixed already. No, values are not different with 4.4.0 (see below). Which version of VirtualBox are you using? I'm using 4.3.34. The format of my VM-harddisks is native-VB (.vdi). Kernel 4.4.0, a 2-GB-XP-VM running. "Baseline"-nr_mapped was 484368 kB: ~> cat /proc/meminfo MemTotal: 11975164 kB MemFree: 5432192 kB MemAvailable: 8155428 kB Buffers: 2114456 kB Cached: 1056064 kB SwapCached: 0 kB Active: 2212676 kB Inactive: 1788856 kB Active(anon): 832660 kB Inactive(anon): 511620 kB Active(file): 1380016 kB Inactive(file): 1277236 kB Unevictable: 40 kB Mlocked: 40 kB SwapTotal: 10713084 kB SwapFree: 10713084 kB Dirty: 80 kB Writeback: 288 kB AnonPages: 831128 kB Mapped: 2769240 kB Shmem: 513192 kB Slab: 175244 kB SReclaimable: 117620 kB SUnreclaim: 57624 kB KernelStack: 6976 kB PageTables: 45912 kB
(In reply to Jay from comment #22) > (In reply to Chen Yu from comment #21) > > (In reply to Jay from comment #20) > > > (In reply to Chen Yu from comment #19) > > > > > > > Theoretically, the minimum_image_size should be the number of pages > minus > > > > the number of reclaimable 'within' savable_pages, why? because the > snaphot > > > > image only concerns about the savable pages. You can not simply add the > > > > unreclaimable together w/o considering the savable pages IMO, besides, > > > > NR_FILE_DIRTY and NR_WRITEBACK might be actually reclaimable. > > > > > > But in low-nr_mapped-cases the values are close to what the original > > > delivers: > > > > > > minimum_pages, by nonfreeable items = 224814 pages, 878 MB > > > minimum_pages, orig. version = 221699 pages, 866 MB > > > > > Humm, yes, it is suitable in your case, but I'm not sure if it fits for > > other users, we need more investigation on this. But before that, could you > > please have a try on latest 4.4 kernel? Since I can not reproduce your > > problem on this version, I want to confirm if the calculating of nr_mapped > > has been fixed already. > > No, values are not different with 4.4.0 (see below). Which version of > VirtualBox are you using? I'm using 4.3.34. The format of my VM-harddisks is > native-VB (.vdi). > > Kernel 4.4.0, a 2-GB-XP-VM running. "Baseline"-nr_mapped was 484368 kB: > I did not use Virtual Box but wrote a small mmap(2G disk file) to simulate. What does VM-harddisks(native-VB) mean, is it something like a ramdisk for Virtual Box?
... > I did not use Virtual Box but wrote a small mmap(2G disk file) to simulate. > What does VM-harddisks(native-VB) mean, is it something like a ramdisk for > Virtual Box? Sorry. .vdi is VB's standard-format for the VM's virtual harddisk, like .vmdk is for VMware. I've also created a VB-VM with the .vmdk-format but nr_mapped is the same as with an otherwise identical .vdi-VM. Don't know whether this is relevant but I've noticed that the nr_mapped for a 32-bit-Windows-VM is significantly higher than that for 64-bit-Linux-VMs. The close correlation of memory-value chosen for the VM and nr_mapped is only true for Windows-VMs: A 2-GB-XP-VM *always* leads to nr_mapped of 2 GB plus baseline-nr_mapped (as in the example in my last comment). But a 2- or 3-GB-Linux-VM increases nr_mapped only by about 500 MB, at least initially. With each additional Linux-VM nr_mapped goes up but not as much as with a 32-bit-Windows-VM. Nevertheless, nr_mapped is still relatively high. Example Baseline: ~> cat /proc/meminfo MemTotal: 11997612 kB MemFree: 8232552 kB MemAvailable: 10381072 kB Buffers: 1394968 kB Cached: 1155064 kB SwapCached: 0 kB Active: 1406316 kB Inactive: 2017316 kB Active(anon): 875448 kB Inactive(anon): 303740 kB Active(file): 530868 kB Inactive(file): 1713576 kB Unevictable: 32 kB Mlocked: 32 kB SwapTotal: 10713084 kB SwapFree: 10713084 kB Dirty: 60 kB Writeback: 0 kB AnonPages: 873648 kB Mapped: 398880 kB Shmem: 305588 kB Slab: 184752 kB SReclaimable: 146024 kB SUnreclaim: 38728 kB KernelStack: 6832 kB PageTables: 40300 kB 2-GB-Linux-VM, values at login: ~> cat /proc/meminfo MemTotal: 11997612 kB MemFree: 7610076 kB MemAvailable: 9761128 kB Buffers: 1397120 kB Cached: 1230064 kB SwapCached: 0 kB Active: 1648540 kB Inactive: 1901528 kB Active(anon): 924724 kB Inactive(anon): 378520 kB Active(file): 723816 kB Inactive(file): 1523008 kB Unevictable: 40 kB Mlocked: 40 kB SwapTotal: 10713084 kB SwapFree: 10713084 kB Dirty: 2068 kB Writeback: 0 kB AnonPages: 922916 kB Mapped: 935824 kB Shmem: 380368 kB Slab: 188808 kB SReclaimable: 146324 kB SUnreclaim: 42484 kB KernelStack: 7200 kB PageTables: 42348 kB Another 2-GB-Linux-VM started (from saved state): ~> cat /proc/meminfo MemTotal: 11997612 kB MemFree: 6021372 kB MemAvailable: 8572456 kB Buffers: 1796464 kB Cached: 1262508 kB SwapCached: 0 kB Active: 1702776 kB Inactive: 2332824 kB Active(anon): 978564 kB Inactive(anon): 410864 kB Active(file): 724212 kB Inactive(file): 1921960 kB Unevictable: 40 kB Mlocked: 40 kB SwapTotal: 10713084 kB SwapFree: 10713084 kB Dirty: 576 kB Writeback: 0 kB AnonPages: 976764 kB Mapped: 2040000 kB Shmem: 412712 kB Slab: 199012 kB SReclaimable: 147696 kB SUnreclaim: 51316 kB KernelStack: 7600 kB PageTables: 45504 kB Another 3-GB-Linux-VM started, values at login: ~> cat /proc/meminfo MemTotal: 11997612 kB MemFree: 5272052 kB MemAvailable: 8006412 kB Buffers: 1979004 kB Cached: 1305652 kB SwapCached: 0 kB Active: 1777544 kB Inactive: 2555572 kB Active(anon): 1050316 kB Inactive(anon): 453812 kB Active(file): 727228 kB Inactive(file): 2101760 kB Unevictable: 40 kB Mlocked: 40 kB SwapTotal: 10713084 kB SwapFree: 10713084 kB Dirty: 1160 kB Writeback: 0 kB AnonPages: 1048544 kB Mapped: 2509620 kB Shmem: 455660 kB Slab: 203416 kB SReclaimable: 148616 kB SUnreclaim: 54800 kB KernelStack: 8048 kB PageTables: 47456 kB
Hi Yu, I know what caused the high nr_mapped. Two days ago I noticed that one of the VMs (which I almost never use) did not increase nr_mapped above the baseline-value. So I checked its settings and compared it to those of the other VMs. No relevant differences. Even with every possible option that the VirtualBox-GUI offers set to the same value, nr_mapped stayed low while it rose to the known high levels for the others. Today I used VB's command-line-tool, VBoxManage, to compare the settings again. Turned out that the VM in fact does have a different setting: "Large Pages: on" instead of "off" like for the others. The VirtualBox manual says: "• --largepages on|off: If hardware virtualization and nested paging are enabled, for Intel VT-x only, an additional performance improvement of up to 5% can be obtained by enabling this setting. This causes the hypervisor to use large pages to reduce TLB use and overhead." Up to this day, I didn't even know this option existed, let alone having set it to "on" for a VM. I use the VB-GUI to create and manage the VMs. And the GUI doesn't show this option. I must have created this said VM with a VB-version that had "Large Pages: on" as the default, perhaps under Windows. I can't imagine an other explanation. Setting "Large Pages: on" for the other VMs had a dramatic effect on nr_mapped. For three 2-GB-VMs loaded (plus other apps running): Mapped: 621284 kB. That's a value the original minimum_image_size() can handle: PM: Preallocating image memory... save_highmem = 0 pages, 0 MB saveable = 2155130 pages, 8418 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061409 pages, 11958 MB count = 3023206 pages, 11809 MB max_size = 1510469 pages, 5900 MB user_specified_image_size = 1349731 pages, 5272 MB adjusted_image_size = 1349732 pages, 5272 MB minimum_pages (by excluding nr_mapped) = 1158750 pages, 4526 MB minimum_pages, by nonfreeable items = 311775 pages, 1217 MB minimum_pages, if (nr_mapped > sum_2) = 1328725 pages, 5190 MB minimum_pages, orig. version = 1328725 pages, 5190 MB target_image_size = 1349732 pages, 5272 MB nr_should_reclaim = 805398 pages, 3146 MB nr_reclaimed = 669510 pages, 2615 MB ... To me it seems we may close this bug-report as "solved". Although I still think that a high nr_mapped is the Achilles' heel of minimum_image_size(). If this "Large Page: on|off"-info is in any way relevant for the Linux-memory-management, you will know what to do. Would be nice to know if there's a kernel-setting that could prevent this nasty behaviour. I shall inform a Linux-forum about this (I promised to do so) and perhaps the VirtualBox-people. For now: Thanks for your time and effort! Jay
(In reply to Jay from comment #25) > Setting "Large Pages: on" for the other VMs had a dramatic effect on > nr_mapped. For three 2-GB-VMs loaded (plus other apps running): Mapped: > 621284 kB. > > To me it seems we may close this bug-report as "solved". Although I still > think that a high nr_mapped is the Achilles' heel of minimum_image_size(). Yes, there is still problem in minimum_image_size IMO. > > If this "Large Page: on|off"-info is in any way relevant for the > Linux-memory-management, you will know what to do. > Would be nice to know if there's a kernel-setting that could prevent this > nasty behaviour. Can you upload your kernel config file? I wonder if this is related to transparent pages, or hugetlb. Another question is, is the /proc/meminfo shows any hugetlb info such as: HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 243504 kB DirectMap2M: 11210752 kB DirectMap1G: 24117248 kB
(In reply to Jay from comment #25) > To me it seems we may close this bug-report as "solved". Although I still > think that a high nr_mapped is the Achilles' heel of minimum_image_size(). > > If this "Large Page: on|off"-info is in any way relevant for the > Linux-memory-management, you will know what to do. > Would be nice to know if there's a kernel-setting that could prevent this > nasty behaviour. > I shall inform a Linux-forum about this (I promised to do so) and perhaps > the VirtualBox-people. It would be nice if you can help ask the VirtualBox people what 'Large Page' does in their software. > > For now: Thanks for your time and effort! > > Jay
(In reply to Chen Yu from comment #26) > (In reply to Jay from comment #25) > > Setting "Large Pages: on" for the other VMs had a dramatic effect on > > nr_mapped. For three 2-GB-VMs loaded (plus other apps running): Mapped: > > 621284 kB. > > > > To me it seems we may close this bug-report as "solved". Although I still > > think that a high nr_mapped is the Achilles' heel of minimum_image_size(). > Yes, there is still problem in minimum_image_size IMO. > > And it seems a hard nut to crack. If you can't find a way to stop nr_mapped going through the roof or can't warm to the idea of excluding nr_mapped, perhaps we should just use a value like "used - buffers/cache" in "free" for minimum_pages. This would have worked most of the time. But I couldn't find the source-code, so I don't know how it is calculated. > > If this "Large Page: on|off"-info is in any way relevant for the > > Linux-memory-management, you will know what to do. > > Would be nice to know if there's a kernel-setting that could prevent this > > nasty behaviour. > Can you upload your kernel config file? I wonder if this is related to > transparent pages, or hugetlb. Another question is, is the /proc/meminfo > shows any hugetlb info such as: > > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 243504 kB > DirectMap2M: 11210752 kB > DirectMap1G: 24117248 kB AnonHugePages: 821248 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 7680 kB DirectMap2M: 12238848 kB
(In reply to Chen Yu from comment #27) > > I shall inform a Linux-forum about this (I promised to do so) and perhaps > > the VirtualBox-people. > It would be nice if you can help ask the VirtualBox people what 'Large Page' > does in their software. Well, I got second thoughts about informing the VirtualBox-guys. I would have had to create an Oracle account to file a bug-report/forum-post. And for my taste they wanted way too much private information from me for just that. But perhaps this info from the manual may be useful for you: "10.7 Nested paging and VPIDs In addition to “plain” hardware virtualization, your processor may also support additional sophisticated techniques:2 • A newer feature called “nested paging” implements some memory management in hardware, which can greatly accelerate hardware virtualization since these tasks no longer need to be performed by the virtualization software. With nested paging, the hardware provides another level of indirection when translating linear to physical addresses. Page tables function as before, but linear addresses are now translated to “guest physical” addresses first and not physical addresses directly. A new set of paging registers now exists under the traditional paging mechanism and translates from guest physical addresses to host physical addresses, which are used to access memory. Nested paging eliminates the overhead caused by VM exits and page table accesses. In essence, with nested page tables the guest can handle paging without intervention from the hypervisor. Nested paging thus significantly improves virtualization performance. On AMD processors, nested paging has been available starting with the Barcelona (K10) architecture – they call it now “rapid virtualization indexing” (RVI). Intel added support for nested paging, which they call “extended page tables” (EPT), with their Core i7 (Nehalem) processors. If nested paging is enabled, the VirtualBox hypervisor can also use large pages to reduce TLB usage and overhead. This can yield a performance improvement of up to 5%. To enable this feature for a VM, you need to use the VBoxManage modifyvm --largepages command; see chapter 8.8, VBoxManage modifyvm, page 131. • On Intel CPUs, another hardware feature called “Virtual Processor Identifiers” (VPIDs) can greatly accelerate context switching by reducing the need for expensive flushing of the processor’s Translation Lookaside Buffers (TLBs). To enable these features for a VM, you need to use the VBoxManage modifyvm --vtxvpid and --largepages commands; see chapter 8.8, VBoxManage modifyvm, page 131. 2VirtualBox 2.0 added support for AMD’s nested paging; support for Intel’s EPT and VPIDs was added with version 2.1."
Here's what I dug up from the current kernel: grep -i hugepage .config CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set grep -i hugetlb .config CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_CGROUP_HUGETLB=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y The VirtualBox-manual says that "large pages: on|off" is for Intel VT-x only. So may be for AMD-CPUs nr_mapped can not be reduced to a "normal" range.
Could it be that only part of nr_mapped is in Active(file)/Inactive(file), say 10 to 30 % or so? And the problem comes from subtracting 100 %? This is the status before and after a drop_caches. It's clear that not all of nr_mapped can be contained in the other two. And Active(file)/Inactive(file) may not belong 100 % to nr_mapped: Before: Active(file): 231984 kB Inactive(file): 412508 kB Mapped: 2802508 kB After: Active(file): 180420 kB Inactive(file): 93280 kB Mapped: 2802812 kB
(In reply to Jay from comment #31) > Could it be that only part of nr_mapped is in Active(file)/Inactive(file), > say 10 to 30 % or so? > > And the problem comes from subtracting 100 %? > > This is the status before and after a drop_caches. It's clear that not all > of nr_mapped can be contained in the other two. And > Active(file)/Inactive(file) may not belong 100 % to nr_mapped: > > Before: > > Active(file): 231984 kB > Inactive(file): 412508 kB > Mapped: 2802508 kB > > > After: > > Active(file): 180420 kB > Inactive(file): 93280 kB > Mapped: 2802812 kB I wanted to see what the values are in a real s2disk. So I changed int hibernate_preallocate_memory(void) a bit. Among other things I placed a shrink_all_memory() at the beginning of it, before saveable etc. are calculated. What I get is like this: PM: Preallocating image memory... nr_mapped = 711877 pages, 2780 MB reclaimable_anon_slab = 428366 pages, 1673 MB reclaim active_inactive(file) = 370812 pages, 1448 MB /* shrink_all_memory() */ nr_reclaimed = 336461 pages, 1314 MB active_inactive(file) = 92412 pages, 360 MB reclaimable_anon_slab = 361144 pages, 1410 MB save_highmem = 0 pages, 0 MB saveable = 1419169 pages, 5543 MB ... active_inactive(file) *after* shrink_all_memory() was just 360 MB, which means that only 25 % of it were *not* reclaimed. With respect to nr_mapped the ratio is 360/2780 which is just 13 %. So in this case only 13 % of nr_mapped should have been subtracted in minimum_image_size() but 100 % were. And that's too much. When nr_mapped is in a normal range, the ratio of active_inactive(file)/nr_mapped after shrink_all_memory() is closer to 1, like here: PM: Preallocating image memory... nr_mapped = 128712 pages, 502 MB reclaimable_anon_slab = 366529 pages, 1431 MB reclaim active_inactive(file) = 184209 pages, 719 MB nr_reclaimed = 95499 pages, 373 MB active_inactive(file) = 96187 pages, 375 MB reclaimable_anon_slab = 349050 pages, 1363 MB ... 375/502 = 0.75. In such cases it seems acceptable if nr_mapped is fully subtracted. But I think it's not possible to determine whether nr_mapped is in a "normal" range or too high. So a nr_mapped-free way to calculate minimum_image_size() would be preferable, IMO. As an alternative to just excluding nr_mapped, I'm testing this now: - shrink_all_memory("value of active_inactive(file") at the beginning of int hibernate_preallocate_memory() - what remains of active_inactive(file) cannot be reclaimed and thus is excluded from minimum_image_size(), along with nr_mapped. minimum_image_size() looks like this: static unsigned long minimum_image_size(unsigned long saveable) { unsigned long size; size = global_page_state(NR_SLAB_RECLAIMABLE) + global_page_state(NR_ACTIVE_ANON) + global_page_state(NR_INACTIVE_ANON); return saveable - size; } (I guess it's not possible that saveable can be <= size, so I simplified the return-statement.) So far it's OK.
(In reply to Jay from comment #31) > Could it be that only part of nr_mapped is in Active(file)/Inactive(file), > say 10 to 30 % or so? > Indeed. some mmap file can not be reclaimed, so it is not counted in Active/Inactive(file), for example, there is also a 'Unevictable' lru besides Active LRU and InActive LRU, some mmap pages might be moved from Active/Inactive(file) lru to 'Unevictable' lru. I also wrote for help to some people on mm list, they confirmed that: "There are also unreclaimable pages which wouldn't be accounted for in NR_ACTIVE_FILE/NR_INACTIVE_FILE. I can see those getting accounted for in NR_FILE_MAPPED, though." And he also suspect there is a bug in VirtualBox binary driver that increse the number of nr_mapped. Anyway, we should deal with the 'imprecise' nr_mapped.
(In reply to Jay from comment #32) > (In reply to Jay from comment #31) > > As an alternative to just excluding nr_mapped, I'm testing this now: > > - shrink_all_memory("value of active_inactive(file") at the beginning of int > hibernate_preallocate_memory() > > - what remains of active_inactive(file) cannot be reclaimed and thus is > excluded from minimum_image_size(), along with nr_mapped. > This breaks the orginal context of minimum_image_size: the reason why minimum_image_size is introduced is because user doesn't want to reclaim too much pages, if too many page cache are relaimed , after the system resumes, all the page cache will be re-read in, and it cost too much time to resume from hibernation, you can check the code comment for minimum_image_size.
(In reply to Chen Yu from comment #33) > (In reply to Jay from comment #31) > > Could it be that only part of nr_mapped is in Active(file)/Inactive(file), > > say 10 to 30 % or so? > > > Indeed. some mmap file can not be reclaimed, so it is not counted in > Active/Inactive(file), for example, there is also a 'Unevictable' lru besides > Active LRU and InActive LRU, some mmap pages might be moved from > Active/Inactive(file) lru to 'Unevictable' lru. I also wrote for help to > some people on mm list, they confirmed that: > "There are also unreclaimable pages which wouldn't be accounted for in > NR_ACTIVE_FILE/NR_INACTIVE_FILE. I can see those getting accounted for in > NR_FILE_MAPPED, though." > And he also suspect there is a bug in VirtualBox binary driver that increse > the number of nr_mapped. > Anyway, we should deal with the 'imprecise' nr_mapped. A "theoretical" shrink_all_memory(in_active(file)) would be nice, so that we could use the return-value as the "true" nr_mapped. Or is there an alternative to nr_mapped? A (perhaps derived) value that can not be confused by VirtualBox?
(In reply to Chen Yu from comment #34) > (In reply to Jay from comment #32) > > (In reply to Jay from comment #31) > > > > As an alternative to just excluding nr_mapped, I'm testing this now: > > > > - shrink_all_memory("value of active_inactive(file") at the beginning of > int > > hibernate_preallocate_memory() > > > > - what remains of active_inactive(file) cannot be reclaimed and thus is > > excluded from minimum_image_size(), along with nr_mapped. > > > This breaks the orginal context of minimum_image_size: > the reason why minimum_image_size is introduced is because user doesn't want > to reclaim too much pages, if too many page cache are relaimed , after the > system resumes, all the page cache will be re-read in, and it cost too much > time to resume from hibernation, you can check the code comment for > minimum_image_size. Maybe the resumes take a bit longer, hard to say. No problem here but may depend on hardware. On the other hand the s2disk-process seems to be a bit faster. So, on balance... But my primary intention was to find out about the "true" value of those bloated nr_mapped.
... > But I think it's not possible to determine whether nr_mapped is in a > "normal" range or too high. ... I may have to recant. Surprisingly, (falsely) high nr_mapped-values may be detected through the ratio of reclaimable anon_slab to nr_mapped. Values < 2 indicate unusually high nr_mapped and values < 1 signal that we may be close to failure of s2disk (see table at bottom). This might be used for (yet) another workaround: /* exclude nr_mapped if unusually and probably falsely high */ static unsigned long minimum_image_size(unsigned long saveable) { unsigned long size, nr_mapped; nr_mapped = global_page_state(NR_FILE_MAPPED); size = global_page_state(NR_SLAB_RECLAIMABLE) + global_page_state(NR_ACTIVE_ANON) + global_page_state(NR_INACTIVE_ANON); if (size / nr_mapped < 1) nr_mapped = 0; size += global_page_state(NR_ACTIVE_FILE) + global_page_state(NR_INACTIVE_FILE) - nr_mapped; return saveable <= size ? 0 : saveable - size; } Only very high nr_mapped are excluded. Otherwise it operates like the original. ************************************************* Normal nr_mapped: PM: Preallocating image memory... nr_mapped = 185056 pages, 722 MB reclaimable anon_slab = 454501 pages, 1775 MB reclaimable active_inactive(file) = 720303 pages, 2813 MB save_highmem = 0 pages, 0 MB saveable = 1859611 pages, 7264 MB ... minimum_pages, condit. excluding nr_mapped = 869863 pages, 3397 MB minimum_pages, orig. version = 869863 pages, 3397 MB minimum_pages, excl. nr_mapped = 684807 pages, 2675 MB ... High nr_mapped: PM: Preallocating image memory... nr_mapped = 851944 pages, 3327 MB reclaimable anon_slab = 370605 pages, 1447 MB reclaimable active_inactive(file) = 651405 pages, 2544 MB save_highmem = 0 pages, 0 MB saveable = 1903696 pages, 7436 MB ... minimum_pages, condit. excluding nr_mapped = 881686 pages, 3444 MB minimum_pages, orig. version = 1733630 pages, 6771 MB /* Orig. fails */ minimum_pages, excl. nr_mapped = 881686 pages, 3444 MB ... High nr_mapped and high anon_slab (gitk): PM: Preallocating image memory... nr_mapped = 654991 pages, 2558 MB reclaimable anon_slab = 1272056 pages, 4968 MB reclaimable active_inactive(file) = 799541 pages, 3123 MB save_highmem = 0 pages, 0 MB saveable = 2758432 pages, 10775 MB ... minimum_pages, condit. excluding nr_mapped = 1341913 pages, 5241 MB minimum_pages, orig. version = 1341913 pages, 5241 MB minimum_pages, excl. nr_mapped = 686906 pages, 2683 MB ... ************************************************* A: nr_mapped; B: reclaimable_anon_slab; ratio B/A A B ratio 506 5.082 10,0 573 4.823 8,4 687 2.262 3,3 613 1.919 3,1 589 1.832 3,1 482 1.488 3,1 582 1.759 3,0 597 1.745 2,9 614 1.762 2,9 502 1.431 2,9 634 1.747 2,8 664 1.803 2,7 606 1.616 2,7 617 1.561 2,5 693 1.710 2,5 565 1.315 2,3 725 1.669 2,3 615 1.408 2,3 592 1.296 2,2 1.505 1.939 1,3 2.535 3.219 1,3 1.646 1.917 1,2 1.726 1.980 1,1 1.628 1.830 1,1 1.570 1.712 1,1 1.563 1.612 1,0 1.176 1.197 1,0 2.758 1.712 0,6 2.780 1.673 0,6 2.659 1.472 0,6 2.625 1.286 0,5 2.660 1.189 0,4 3.482 1.167 0,3
(In reply to Jay from comment #37) > /* exclude nr_mapped if unusually and probably falsely high */ > static unsigned long minimum_image_size(unsigned long saveable) > { > unsigned long size, nr_mapped; > > nr_mapped = global_page_state(NR_FILE_MAPPED); > > size = global_page_state(NR_SLAB_RECLAIMABLE) > + global_page_state(NR_ACTIVE_ANON) > + global_page_state(NR_INACTIVE_ANON); > > if (size / nr_mapped < 1) > nr_mapped = 0; > > size += global_page_state(NR_ACTIVE_FILE) > + global_page_state(NR_INACTIVE_FILE) > - nr_mapped; > > return saveable <= size ? 0 : saveable - size; > } > At times... This if (size / nr_mapped < 1) nr_mapped = 0; should better have been this: if (size < nr_mapped) nr_mapped = 0;
(In reply to Chen Yu from comment #33) > And he also suspect there is a bug in VirtualBox binary driver that increse > the number of nr_mapped. Perhaps another indication for that is the discrepancy between what the manual states about a performance-gain through "Large pages: on" and what I see here. The manual says: "[...]If nested paging is enabled, the VirtualBox hypervisor can also use large pages to reduce TLB usage and overhead. This can yield a performance improvement of up to 5%.[...]" But what I see here is the opposite: The VMs are a tad quicker with large pages "off". So perhaps they simply mixed up "on" with "off" somewhere in their code.
(In reply to Jay from comment #39) > (In reply to Chen Yu from comment #33) > > > And he also suspect there is a bug in VirtualBox binary driver that increse > > the number of nr_mapped. > > Perhaps another indication for that is the discrepancy between what the > manual states about a performance-gain through "Large pages: on" and what I > see here. The manual says: > > "[...]If nested paging is enabled, the VirtualBox hypervisor can also use > large pages to reduce > TLB usage and overhead. This can yield a performance improvement of up to > 5%.[...]" > > But what I see here is the opposite: The VMs are a tad quicker with large > pages "off". > > So perhaps they simply mixed up "on" with "off" somewhere in their code. The 'nest pages' is a hardware feature that can reduce the usage of page table for vm guest, if it is in a "off" state, linux host has to use more page tables to translate the guest pa to actual pa(pa stands for physical address). So nr_mapped might be related to host page tables, but I don't know the exact place.
(In reply to Jay from comment #38) > (In reply to Jay from comment #37) > > > /* exclude nr_mapped if unusually and probably falsely high */ > > static unsigned long minimum_image_size(unsigned long saveable) > > { > > unsigned long size, nr_mapped; > > > > nr_mapped = global_page_state(NR_FILE_MAPPED); > > > > size = global_page_state(NR_SLAB_RECLAIMABLE) > > + global_page_state(NR_ACTIVE_ANON) > > + global_page_state(NR_INACTIVE_ANON); > > > > if (size / nr_mapped < 1) > > nr_mapped = 0; > > > > size += global_page_state(NR_ACTIVE_FILE) > > + global_page_state(NR_INACTIVE_FILE) > > - nr_mapped; > > > > return saveable <= size ? 0 : saveable - size; > > } > > > > At times... > > This > > if (size / nr_mapped < 1) > nr_mapped = 0; > > should better have been this: > > if (size < nr_mapped) > nr_mapped = 0; In order to sell this solution out, we need to convince maintainers why we did like this, is it a experience formula or based on a certain evidence, in most cases the latter would be more persuasive :) BTW, do you know if using kvm could also reproduce this problem?
... > > In order to sell this solution out, we need to convince maintainers why we > did like this, is it a experience formula or based on a certain evidence, in > most cases the latter would be more persuasive :) > Experience and evidence, I think. That excluding nr_mapped works, is a proven fact for me now. And that size < nr_mapped may be a good way to catch the cases of problematically high nr_mapped is based on the data that I've analyzed so far (see the table in Comment 37). What I can say is that under normal circumstances (i.e. no bloated nr_mapped due to a VB-VM) the sum of global_page_state(NR_SLAB_RECLAIMABLE) + global_page_state(NR_ACTIVE_ANON) + global_page_state(NR_INACTIVE_ANON) is never < nr_mapped on my system. And I suppose this is also the case on other systems. But I do not know WHY it is so. I'm just observing and using it. Providing the explanation ("Normally, the sum of... is > nr_mapped because...") would be your part. ;) But before submitting the idea we should try to make sure that we're not just dealing with a bug in VirtualBox. I would have already filed a bug report but for that they want your name, address, phone-no. etc. And I don't like that. But perhaps you could give your colleague at Oracle a call and point him to this thread? Surely they must be interested in an opportunity to improve their product - as well as good relations with Intel? ;) > BTW, do you know if using kvm could also reproduce this problem? I haven't used kvm so far. But perhaps Aaron Lu has. He's on the sunny side of the street because he uses a virt.-app that makes use of anon-pages instead of nr_mapped (see https://bugzilla.kernel.org/show_bug.cgi?id=47931#c56).
Right, qemu makes use of anonymous pages instead of mapped.
(In reply to Jay from comment #42) > due to a VB-VM) the sum of > > global_page_state(NR_SLAB_RECLAIMABLE) > + global_page_state(NR_ACTIVE_ANON) > + global_page_state(NR_INACTIVE_ANON) > > is never < nr_mapped on my system. And I suppose this is also the case on > other systems. Unfortunately I have to say, ANON+SLAB does not have any relationship with NR_MAPPED in the source code... they are just two seperated kinds of pages. > But perhaps you could give your colleague at Oracle a call and point him to > this thread? Surely they must be interested in an opportunity to improve > their product - as well as good relations with Intel? ;) I don't have a stool pigeon in Oracle, but I'll try VirtualBox on my side, hope it is not too hard..
(In reply to Aaron Lu from comment #43) > Right, qemu makes use of anonymous pages instead of mapped. Hi Aaron, I remember Jay has mentioned that he is using a VirtualBox's standard-format for the VM's virtual harddisk, can kvm simulate a harddisk too?
(In reply to Chen Yu from comment #45) > (In reply to Aaron Lu from comment #43) > > Right, qemu makes use of anonymous pages instead of mapped. > Hi Aaron, I remember Jay has mentioned that he is using a VirtualBox's > standard-format for the VM's virtual harddisk, can kvm simulate a harddisk > too? Of course, qemu is capable of using multiple formats of hard disks and I have been using qcow2.
(In reply to Chen Yu from comment #44) > (In reply to Jay from comment #42) > > due to a VB-VM) the sum of > > > > global_page_state(NR_SLAB_RECLAIMABLE) > > + global_page_state(NR_ACTIVE_ANON) > > + global_page_state(NR_INACTIVE_ANON) > > > > is never < nr_mapped on my system. And I suppose this is also the case on > > other systems. > Unfortunately I have to say, ANON+SLAB does not have any relationship with > NR_MAPPED in the source code... they are just two seperated kinds of pages. Tough luck, then. But it's still the best indicator I have found. So I'll continue to use it. > > But perhaps you could give your colleague at Oracle a call and point him to > > this thread? Surely they must be interested in an opportunity to improve > > their product - as well as good relations with Intel? ;) > I don't have a stool pigeon in Oracle, but I'll try VirtualBox on my side, > hope it is not too hard.. It's easy. Once you have created a VM, you can use the following commands to check for and set "Large Pages: on/off ~> VBoxManage showvminfo "name of VM" --details ~> VBoxManage modifyvm "name of VM" --largepages on
ok, I can reproduced it on virtualbox now, interesting..
(In reply to Chen Yu from comment #48) > ok, I can reproduced it on virtualbox now, interesting.. Do you think it's a bug or just a quirk? I've posted a few words about possible problems with s2disk and VirtualBox and the fix through "large pages: on" in two openSuse-forums. That info can be found by googling and should help people with Intel-CPUs. However, "large pages: on" won't work for AMD-CPUs (acc. to the VB-manual). But are they affected by high nr_mapped at all? And if they are - and if high nr_mapped is just a quirk of VirtualBox - would that warrant a change in the kernel-code? In other words: how shall we proceed?
(In reply to Jay from comment #49) > (In reply to Chen Yu from comment #48) > > ok, I can reproduced it on virtualbox now, interesting.. > > Do you think it's a bug or just a quirk? > > I've posted a few words about possible problems with s2disk and VirtualBox > and the fix through "large pages: on" in two openSuse-forums. That info can > be found by googling and should help people with Intel-CPUs. > > However, "large pages: on" won't work for AMD-CPUs (acc. to the VB-manual). > But are they affected by high nr_mapped at all? > > And if they are - and if high nr_mapped is just a quirk of VirtualBox - > would that warrant a change in the kernel-code? > > In other words: how shall we proceed? I've looked through some of the virtualbox driver code, but unfortunately I haven't found any piece of code would manipulate the counter of nr_mapped, so I suspect there should be some black magic inside the linux, although I do not have much bandwidth on this recently, I'll return to this thread later.
(In reply to Chen Yu from comment #50) snip > > I've looked through some of the virtualbox driver code, but unfortunately I > haven't found any piece of code would manipulate the counter of nr_mapped, > so I suspect there should be some black magic inside the linux, although I > do not have much bandwidth on this recently, I'll return to this thread > later. Maybe this is obvious to you - but just to have mentioned it: A while ago I've noticed that "large pages: on" reduces the amount of memory added to "mapped" by the factor 8. So a 2048-MB-VM adds only around 256 MB to mapped with "large pages:on" but 2048 MB with "large pages:off". Perhaps "on" sets the page size to 4096 bytes and "off" to 512.
Maybe I have found a way to get rid of that nr_mapped-problem. By reclaiming the cache at the beginning of the second part of hibernate_preallocate_memory() - which is entered only in high-mem-load cases - we can exclude the cache-part from minimum_image_size(). In low- to medium-load cases (probably the majority) everything stays as is. In high-load cases a shrink_all_memory() is done anyway in the original function - so no major difference. Only that now it is done earlier and in a slightly modified form. It looks like this: ... if (size >= saveable) { pages = preallocate_image_highmem(save_highmem); pages += preallocate_image_memory(saveable - pages, avail_normal); goto out; } /* reclaim at least the cache and adjust saveable */ to_reclaim = max(reclaimable_file, saveable - size); saveable -= shrink_all_memory(to_reclaim); /* Estimate the minimum size of the image. */ pages = minimum_image_size(saveable); ... And minimum_image_size() is now free of nr_mapped: static unsigned long minimum_image_size(unsigned long saveable) { unsigned long size; size = global_page_state(NR_SLAB_RECLAIMABLE) + global_page_state(NR_ACTIVE_ANON) + global_page_state(NR_INACTIVE_ANON); return saveable <= size ? 0 : saveable - size; } Needs further testing but results so far look good. s2disks and resumes in higher-load cases are at least as fast as with the orig. For example a high-load case (2-GB-VM + gitk + others): PM: Preallocating image memory... nr_mapped = 631808 pages, 2468 MB active_inactive(file) = 774156 pages, 3024 MB reclaimable_anon_slab = 1150294 pages, 4493 MB save_highmem = 0 pages, 0 MB saveable = 2610116 pages, 10195 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061450 pages, 11958 MB count = 3023247 pages, 11809 MB max_size = 1510489 pages, 5900 MB user_specified_image_size = 1349734 pages, 5272 MB adjusted_image_size = 1349735 pages, 5272 MB to_reclaim = 1260381 pages, 4923 MB nr_reclaimed = 604510 pages, 2361 MB saveable = 2005606 pages, 7834 MB active_inactive(file) = 206154 pages, 805 MB reclaimable_anon_slab = 1108223 pages, 4328 MB minimum_pages = 897383 pages, 3505 MB target_image_size = 1349735 pages, 5272 MB preallocated_high_mem = 0 pages, 0 MB to_alloc = 1512758 pages, 5909 MB to_alloc_adjusted = 1512758 pages, 5909 MB pages_allocated = 1512758 pages, 5909 MB done (allocated 1673512 pages) PM: Allocated 6694048 kbytes in 10.08 seconds (664.09 MB/s) Normal-load: PM: Preallocating image memory... nr_mapped = 627429 pages, 2450 MB active_inactive(file) = 379814 pages, 1483 MB reclaimable_anon_slab = 185542 pages, 724 MB save_highmem = 0 pages, 0 MB saveable = 1247975 pages, 4874 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061315 pages, 11958 MB count = 3023112 pages, 11809 MB max_size = 1510422 pages, 5900 MB user_specified_image_size = 1349734 pages, 5272 MB adjusted_image_size = 1349735 pages, 5272 MB done (allocated 1247975 pages) PM: Allocated 4991900 kbytes in 0.35 seconds (14262.57 MB/s)
The approach outlined above works. I have modified it a bit, mainly because shrink_all_memory() does not always deliver at the first call what it is asked for. It is also not overly precise, +/- 30% or more off target are not unusual. Omitting the debug-statements, it now looks like this: ... if (saveable > size) saveable -= shrink_all_memory(saveable - size); /* * If the desired number of image pages is at least as large as the * current number of saveable pages in memory, allocate page frames for * the image and we're done. */ if (saveable <= size) { pages = preallocate_image_highmem(save_highmem); pages += preallocate_image_memory(saveable - pages, avail_normal); goto out; } /* reclaim at least the cache and adjust saveable */ reclaimable_file = sum_in_active_file(); to_reclaim = max(reclaimable_file, saveable - size); saveable -= shrink_all_memory(to_reclaim); /* Estimate the minimum size of the image. */ pages = minimum_image_size(saveable); ... This works reliably and irrespective of nr_mapped-values. It also seems consistent to me. So far I haven't seen negative effects like prolonged resumes or so. So I finally consider the problem solved for me. As I have also published the way how to avoid high nr_mapped-values for VirtualBox-VMs at the outset, other users can find a simple solution too. But perhaps I'm the only one anyway who uses s2disk while running VirtualBox-VMs. ;) So long! Jay Examples Baseline after reboot: PM: Preallocating image memory... nr_mapped = 109655 pages, 428 MB active_inactive(file) = 193346 pages, 755 MB reclaimable_anon_slab = 379733 pages, 1483 MB save_highmem = 0 pages, 0 MB saveable = 688159 pages, 2688 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061318 pages, 11958 MB count = 3023115 pages, 11809 MB max_size = 1510423 pages, 5900 MB user_specified_image_size = 1349734 pages, 5272 MB adjusted_image_size = 1349735 pages, 5272 MB done (allocated 688159 pages) PM: Allocated 2752636 kbytes in 0.26 seconds (10587.06 MB/s) ************************ Medium load, high nr_mapped: PM: Preallocating image memory... nr_mapped = 652131 pages, 2547 MB active_inactive(file) = 526683 pages, 2057 MB reclaimable_anon_slab = 403426 pages, 1575 MB save_highmem = 0 pages, 0 MB saveable = 1615454 pages, 6310 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061294 pages, 11958 MB count = 3023091 pages, 11808 MB max_size = 1510411 pages, 5900 MB user_specified_image_size = 1349734 pages, 5272 MB adjusted_image_size = 1349735 pages, 5272 MB saveable = 1209334 pages, 4723 MB done (allocated 1209334 pages) PM: Allocated 4837336 kbytes in 1.05 seconds (4606.98 MB/s) ... PM: Need to copy 1202273 pages PM: Hibernation image created (1202273 pages copied) *********************** Medium load, normal nr_mapped: PM: Preallocating image memory... nr_mapped = 137698 pages, 537 MB active_inactive(file) = 689275 pages, 2692 MB reclaimable_anon_slab = 379854 pages, 1483 MB save_highmem = 0 pages, 0 MB saveable = 1752291 pages, 6844 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061418 pages, 11958 MB count = 3023215 pages, 11809 MB max_size = 1510473 pages, 5900 MB user_specified_image_size = 1349734 pages, 5272 MB adjusted_image_size = 1349735 pages, 5272 MB saveable = 1347312 pages, 5262 MB done (allocated 1347312 pages) PM: Allocated 5389248 kbytes in 0.61 seconds (8834.83 MB/s) ... PM: Need to copy 1341278 pages ************************ High load, high nr_mapped: PM: Preallocating image memory... nr_mapped = 843359 pages, 3294 MB active_inactive(file) = 695509 pages, 2716 MB reclaimable_anon_slab = 416732 pages, 1627 MB save_highmem = 0 pages, 0 MB saveable = 1992456 pages, 7783 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061320 pages, 11958 MB count = 3023117 pages, 11809 MB max_size = 1510424 pages, 5900 MB user_specified_image_size = 1349734 pages, 5272 MB adjusted_image_size = 1349735 pages, 5272 MB saveable = 1358617 pages, 5307 MB to_reclaim = 100962 pages, 394 MB nr_reclaimed = 107933 pages, 421 MB saveable = 1250684 pages, 4885 MB active_inactive(file) = 47934 pages, 187 MB reclaimable_anon_slab = 313496 pages, 1224 MB minimum_pages = 937188 pages, 3660 MB target_image_size = 1349735 pages, 5272 MB preallocated_high_mem = 0 pages, 0 MB to_alloc = 1512693 pages, 5908 MB to_alloc_adjusted = 1512693 pages, 5908 MB pages_allocated = 1512693 pages, 5908 MB done (allocated 1673382 pages) PM: Allocated 6693528 kbytes in 2.21 seconds (3028.74 MB/s) ... PM: Need to copy 1246806 pages PM: Hibernation image created (1246806 pages copied) ********************* High load, normal nr_mapped: PM: Preallocating image memory... nr_mapped = 128972 pages, 503 MB active_inactive(file) = 1018633 pages, 3979 MB reclaimable_anon_slab = 443409 pages, 1732 MB save_highmem = 0 pages, 0 MB saveable = 2147765 pages, 8389 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061345 pages, 11958 MB count = 3023142 pages, 11809 MB max_size = 1510437 pages, 5900 MB user_specified_image_size = 1349734 pages, 5272 MB adjusted_image_size = 1349735 pages, 5272 MB saveable = 1364570 pages, 5330 MB to_reclaim = 271074 pages, 1058 MB nr_reclaimed = 225778 pages, 881 MB saveable = 1138792 pages, 4448 MB active_inactive(file) = 78415 pages, 306 MB reclaimable_anon_slab = 362997 pages, 1417 MB minimum_pages = 775795 pages, 3030 MB target_image_size = 1349735 pages, 5272 MB preallocated_high_mem = 0 pages, 0 MB to_alloc = 1512705 pages, 5909 MB to_alloc_adjusted = 1512705 pages, 5909 MB pages_allocated = 1512705 pages, 5909 MB done (allocated 1673407 pages) PM: Allocated 6693628 kbytes in 1.71 seconds (3914.40 MB/s) ... PM: Need to copy 1127012 pages PM: Hibernation image created (1127012 pages copied) **********************
(In reply to Jay from comment #53) > The approach outlined above works. I have modified it a bit, mainly because > shrink_all_memory() does not always deliver at the first call what it is > asked for. It is also not overly precise, +/- 30% or more off target are not > unusual. > > Omitting the debug-statements, it now looks like this: > > ... > if (saveable > size) > saveable -= shrink_all_memory(saveable - size); > > /* > * If the desired number of image pages is at least as large as the > * current number of saveable pages in memory, allocate page frames for > * the image and we're done. > */ > if (saveable <= size) { > pages = preallocate_image_highmem(save_highmem); > pages += preallocate_image_memory(saveable - pages, > avail_normal); > goto out; > } > > /* reclaim at least the cache and adjust saveable */ > reclaimable_file = sum_in_active_file(); > to_reclaim = max(reclaimable_file, saveable - size); > saveable -= shrink_all_memory(to_reclaim); > > /* Estimate the minimum size of the image. */ > pages = minimum_image_size(saveable); > ... > > This works reliably and irrespective of nr_mapped-values. It also seems > consistent to me. So far I haven't seen negative effects like prolonged > resumes or so. > > So I finally consider the problem solved for me. > > As I have also published the way how to avoid high nr_mapped-values for > VirtualBox-VMs at the outset, other users can find a simple solution too. > > But perhaps I'm the only one anyway who uses s2disk while running > VirtualBox-VMs. ;) > > So long! > Thanks for your continuous effort, I'm back and as you are doing testing I will try to figure out where the nr_map exactly comes from.
(In reply to Chen Yu from comment #54) > (In reply to Jay from comment #53) snip > Thanks for your continuous effort, I'm back and as you are doing testing I > will try to figure out where the nr_map exactly comes from. That's probably not going to be easy, I think. Good luck, then! Don't know if this offers you a clue but I have noticed 2 or 3 times after having started a second VM that nr_mapped was higher than expected, even though both VMs were set to "large pages: on". But I can not reproduce this and there wasn't anything obviously special about the situation where it happened, perhaps (!) relatively high memory-load.
As I understand it, the idea behind subtracting nr_mapped from the file cache in minimum_image_size() is to weed out the common set of the two (also see Comment 5). Which implies that nr_mapped should not be greater than the file cache. But take a look at the following numbers. They were obtained immediately after a "sync && sysctl vm.drop_caches=1". What's left of the cache now, should be the common set (maximally). In the first case only the desktop-environment, 2 terminal-windows and an editor were loaded. Even there nr_mapped was greater than the file cache and subtracting it from the latter would have given a wrong result. More so in the second case. So even with "normal" nr_mapped-values, using nr_mapped in the calculation of minimum_image_size() may produce wrong results. PM: Preallocating image memory... nr_mapped = 71202 pages, 278 MB active_inactive(file) = 47595 pages, 185 MB reclaimable_anon_slab = 223598 pages, 873 MB save_highmem = 0 pages, 0 MB saveable = 381520 pages, 1490 MB ... PM: Preallocating image memory... nr_mapped = 208641 pages, 815 MB active_inactive(file) = 80216 pages, 313 MB reclaimable_anon_slab = 446329 pages, 1743 MB save_highmem = 0 pages, 0 MB saveable = 1689922 pages, 6601 MB ...
(In reply to Jay from comment #56) > As I understand it, the idea behind subtracting nr_mapped from the file > cache in minimum_image_size() is to weed out the common set of the two (also > see Comment 5). Which implies that nr_mapped should not be greater than the > file cache. > > But take a look at the following numbers. They were obtained immediately > after a "sync && sysctl vm.drop_caches=1". What's left of the cache now, > should be the common set (maximally). > > In the first case only the desktop-environment, 2 terminal-windows and an > editor were loaded. Even there nr_mapped was greater than the file cache and > subtracting it from the latter would have given a wrong result. More so in > the second case. > > So even with "normal" nr_mapped-values, using nr_mapped in the calculation > of minimum_image_size() may produce wrong results. > ... It was reproducible in KDE, xfce and after logging in to just an X-terminal-session: after drop_caches, nr_mapped was always greater than the filecache (47668 kB vs. 29364 kB in the last case). That means that even in a basic environment nr_mapped can be greater than the true number of mapped filecache pages. I think this speaks against subtracting nr_mapped from the filecache as a means to get the number of reclaimable pages of the cache (s. Comment 5). In most cases the resulting error will be relatively small and veiled by the fact that filecache + anon pages is usually much greater than nr_mapped. But the more nr_mapped deviates from the true number of mapped filecache pages, the greater the error - up to unnecessary failure. If it is not possible (or desirable) to calculate NR_FILE_MAPPED in a way that it truly represents the number of mapped filecache pages, I basically see three alternatives: 1) Leave everything as is accept a certain inaccuracy accept possible failure in case of high nr_mapped-values 2) Exclude nr_mapped from the calculation of minimum_image_size() also accept a certain inaccuracy no failure due to high nr_mapped as simple as it gets thoroughly tested (by me) 3) Remove "cache - nr_mapped" from minimum_image_size(), let the system reclaim the cache instead probably the most accurate results no failure due to high nr_mapped rather simple no failure or negatives so far
(In reply to Jay from comment #57) > It was reproducible in KDE, xfce and after logging in to just an > X-terminal-session: after drop_caches, nr_mapped was always greater than the > filecache (47668 kB vs. 29364 kB in the last case). > > That means that even in a basic environment nr_mapped can be greater than > the true number of mapped filecache pages. > > I think this speaks against subtracting nr_mapped from the filecache as a > means to get the number of reclaimable pages of the cache (s. Comment 5). > So you mean this problem can be reproduced without virtualbox? Then I'd rather treat this as a pure hibernation bug. But is there any easy way to reproduce, could you attach your meminfo before/after drop_cache? And did you echo 3 to drop_cache to reproduce it? Anyway I concern about the root cause for what the nr_mapped means to linux, and once this is figured out, I'd recommend you propose a patch to fix it in the code. What do you think?
(In reply to Chen Yu from comment #58) > (In reply to Jay from comment #57) > > It was reproducible in KDE, xfce and after logging in to just an > > X-terminal-session: after drop_caches, nr_mapped was always greater than > the > > filecache (47668 kB vs. 29364 kB in the last case). > > > > That means that even in a basic environment nr_mapped can be greater than > > the true number of mapped filecache pages. > > > > I think this speaks against subtracting nr_mapped from the filecache as a > > means to get the number of reclaimable pages of the cache (s. Comment 5). > > > So you mean this problem can be reproduced without virtualbox? Then I'd > rather treat this as a pure hibernation bug. But is there any easy way to > reproduce, could you attach your meminfo before/after drop_cache? And did > you echo 3 to drop_cache to reproduce it? I have booted the kernel without all VirtualBox-modules and then logged in to a "failsafe" session (just an X-terminal) and found this before and after "sync && sysctl vm.drop_caches=3": before after active(file): 107376 14048 kB inactive(file): 252440 17868 kB Mapped: 47488 47460 kB Even without anything of VirtualBox running, nr_mapped was greater than the cache after drop_caches. So I think that the problem may indeed not be VirtualBox-specific. VB may just be an extreme case because the VMs consume so much memory. Would be interesting to know if you can reproduce it. Just check meminfo before and after sync && sysctl vm.drop_caches=3 (or 1 or 2) in a terminal. You have to be root for that. > Anyway I concern about the root cause for what > the nr_mapped means to linux, and once this is figured out, I'd recommend > you propose a patch to fix it in the code. What do you think? I also think that nr_mapped is the root of the matter. Perhaps this is what needs to be fixed. We'll see what to do when you have solved that riddle. ************** Before and after "sync && sysctl vm.drop_caches=3". KDE, just 2 terminal-windows and Kate running. Before: ~> cat /proc/meminfo MemTotal: 11997636 kB MemFree: 10443668 kB MemAvailable: 10894300 kB Buffers: 89060 kB Cached: 633744 kB SwapCached: 0 kB Active: 906172 kB Inactive: 389528 kB Active(anon): 573848 kB Inactive(anon): 139544 kB Active(file): 332324 kB Inactive(file): 249984 kB Unevictable: 32 kB Mlocked: 32 kB SwapTotal: 10713084 kB SwapFree: 10713084 kB Dirty: 112 kB Writeback: 0 kB AnonPages: 572920 kB Mapped: 228636 kB Shmem: 140504 kB Slab: 108416 kB SReclaimable: 74516 kB SUnreclaim: 33900 kB KernelStack: 5792 kB PageTables: 35900 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 16711900 kB Committed_AS: 2263376 kB VmallocTotal: 34359738367 kB VmallocUsed: 576520 kB VmallocChunk: 34359133740 kB HardwareCorrupted: 0 kB AnonHugePages: 79872 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 7680 kB DirectMap2M: 12238848 kB After: ~> cat /proc/meminfo MemTotal: 11997636 kB MemFree: 10903044 kB MemAvailable: 10925228 kB Buffers: 4480 kB Cached: 320012 kB SwapCached: 0 kB Active: 667800 kB Inactive: 226820 kB Active(anon): 570508 kB Inactive(anon): 141928 kB Active(file): 97292 kB Inactive(file): 84892 kB Unevictable: 32 kB Mlocked: 32 kB SwapTotal: 10713084 kB SwapFree: 10713084 kB Dirty: 644 kB Writeback: 0 kB AnonPages: 570572 kB Mapped: 234980 kB Shmem: 141928 kB Slab: 51624 kB SReclaimable: 17868 kB SUnreclaim: 33756 kB KernelStack: 5776 kB PageTables: 35748 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 16711900 kB Committed_AS: 2257396 kB VmallocTotal: 34359738367 kB VmallocUsed: 576520 kB VmallocChunk: 34359133740 kB HardwareCorrupted: 0 kB AnonHugePages: 79872 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 7680 kB DirectMap2M: 12238848 kB
I booted Debian 7 in "recovery mode" which leads to a commandline environment. Here it was for the first time that nr_mapped was actually *not* greater than the cache after drop_caches: before after Active(file): 12152 2036 kB Inactive(file): 37220 776 kB Mapped: 2552 2504 kB
After some code walking, some parts of the mapped pages comes from shmem and ipc/shm, and the rest comes from ordinary file page cache, thus active,inactive file. So it is possible that mapped got bigger than active+inactive file, I wonder if there is potential problem that mmaped contains the number of shmem and ipc, I'll take a look then.
(In reply to Chen Yu from comment #61) > After some code walking, some parts of the mapped pages comes from shmem and > ipc/shm, > and the rest comes from ordinary file page cache, thus active,inactive file. > So it is possible that mapped got bigger than active+inactive file, I wonder > if there is potential problem that mmaped contains the number of shmem and > ipc, I'll take a look then. Interesting info, I appreciate your effort! The comment to NR_FILE_MAPPED in mmzone.h says /* pagecache pages mapped into pagetables. only modified from process context */ I guess this was the original concept for nr_mapped and the reason why it is used in minimum_image_size(). According to what you have found out, nr_mapped is no longer reliable in that sense. One problem this causes is that minimum_image_size tends to be too big. And there may be others. So I think this should be fixed, if possible. Or at least the a/m comment needs to be modified. ;)
(In reply to Chen Yu from comment #61) > After some code walking, some parts of the mapped pages comes from shmem and > ipc/shm, Just to be sure: *all* of shmem is contained in nr_mapped? Or only part of it? And do you know of a way to query the system about the amount of ipc/shm? (shmem or nr_shmem is of course no problem.) > and the rest comes from ordinary file page cache, thus active,inactive file. > So it is possible that mapped got bigger than active+inactive file, I wonder > if there is potential problem that mmaped contains the number of shmem and > ipc, I'll take a look then.
(In reply to Jay from comment #62) > (In reply to Chen Yu from comment #61) > > After some code walking, some parts of the mapped pages comes from shmem > and > > ipc/shm, > > and the rest comes from ordinary file page cache, thus active,inactive > file. > > So it is possible that mapped got bigger than active+inactive file, I > wonder > > if there is potential problem that mmaped contains the number of shmem and > > ipc, I'll take a look then. > > Interesting info, I appreciate your effort! > > The comment to NR_FILE_MAPPED in mmzone.h says > > /* pagecache pages mapped into pagetables. > only modified from process context */ > > I guess this was the original concept for nr_mapped and the reason why it is > used in minimum_image_size(). > > According to what you have found out, nr_mapped is no longer reliable in > that sense. One problem this causes is that minimum_image_size tends to be > too big. And there may be others. So I think this should be fixed, if > possible. > > Or at least the a/m comment needs to be modified. ;) After some investigation I have to say, there is no problem of shm, the problem is in minimum_image_size. First, for shm pages, it is marked as file-pages in shmem_mmap, so any mmap to the shm memory would increase the number of NR_FILE_MAPPED. Second, the new allocated shm pages are linked into anon lru(yes, this is a little weird, but the fact is that,shm does not want any user to know its real file position in the system, users just leverage shmget and shmat to use the shm, and they do not need to know the exact place where the shm memory is).So, the increment of shm goes to 'NR_ACTIVE_ANON or NR_INACTIVE_ANON', as well as NR_SHMEM. so general speaking, each time you access the mmaped shm region, the number of NR_FILE_MAPPED++, NR_SHMEM++, (NR_ACTIVE_ANON or NR_INACTIVE_ANON)++ Third, the increment of NR_SHMEM is not only during accessing mmap, but also in other context, such as fallocate/tmpfs write, they also increase the number of NR_SHMEM but without touching other counters. please give a solution:)
(In reply to Chen Yu from comment #64) > (In reply to Jay from comment #62) > > (In reply to Chen Yu from comment #61) > > > After some code walking, some parts of the mapped pages comes from shmem > and > > > ipc/shm, > > > and the rest comes from ordinary file page cache, thus active,inactive > file. > > > So it is possible that mapped got bigger than active+inactive file, I > wonder > > > if there is potential problem that mmaped contains the number of shmem > and > > > ipc, I'll take a look then. > > > > Interesting info, I appreciate your effort! > > > > The comment to NR_FILE_MAPPED in mmzone.h says > > > > /* pagecache pages mapped into pagetables. > > only modified from process context */ > > > > I guess this was the original concept for nr_mapped and the reason why it > is > > used in minimum_image_size(). > > > > According to what you have found out, nr_mapped is no longer reliable in > > that sense. One problem this causes is that minimum_image_size tends to be > > too big. And there may be others. So I think this should be fixed, if > > possible. > > > > Or at least the a/m comment needs to be modified. ;) > > After some investigation I have to say, there is no problem of shm, the > problem is in minimum_image_size. > > First, for shm pages, it is marked as file-pages in shmem_mmap, so > any mmap to the shm memory would increase the number of NR_FILE_MAPPED. > > Second, the new allocated shm pages are linked into anon lru(yes, this is a > little weird, but the fact is that,shm does not want any user to know its > real file position in the system, users just leverage shmget and shmat to > use the shm, and they do not need to know the exact place where the shm > memory is).So, the increment of shm goes to 'NR_ACTIVE_ANON or > NR_INACTIVE_ANON', as well as NR_SHMEM. > > so general speaking, each time you access the mmaped shm region, the number > of > NR_FILE_MAPPED++, NR_SHMEM++, (NR_ACTIVE_ANON or NR_INACTIVE_ANON)++ > > Third, the increment of NR_SHMEM is not only during accessing mmap, but also > in other context, such as fallocate/tmpfs write, they also increase the > number of NR_SHMEM but without touching other counters. > > > please give a solution:) Mission impossible. :) But first let me see if I understood your findings correctly (I assumed that shm and shmem are the same): 1) NR_FILE_MAPPED = nr_mapped_filecache_pages + nr_mapped_shmem_pages This would explain why nr_mapped can be greater than the file cache after executing drop_caches. And the comment to NR_FILE_MAPPED in mmzone.h should be changed. (Or is there a difference between "filecache pages" and "pagecache pages"?) 2) (NR_ACTIVE_ANON + NR_INACTIVE_ANON) includes nr_mapped_shmem_pages 3) nr_mapped_shmem_pages <= NR_SHMEM If 1) and 2) are true, subtracting NR_FILE_MAPPED as in the original minimum_image_size() would be perfectly right. And approaches 2) and 3) from my Comment 57 would be wrong. Although both work where the orig. doesn't. I have to think about it.
(In reply to Jay from comment #65) > (In reply to Chen Yu from comment #64) > > (In reply to Jay from comment #62) > > > (In reply to Chen Yu from comment #61) > > > > After some code walking, some parts of the mapped pages comes from > shmem and > > > > ipc/shm, > > > > and the rest comes from ordinary file page cache, thus active,inactive > file. > > > > So it is possible that mapped got bigger than active+inactive file, I > wonder > > > > if there is potential problem that mmaped contains the number of shmem > and > > > > ipc, I'll take a look then. > > > > > > Interesting info, I appreciate your effort! > > > > > > The comment to NR_FILE_MAPPED in mmzone.h says > > > > > > /* pagecache pages mapped into pagetables. > > > only modified from process context */ > > > > > > I guess this was the original concept for nr_mapped and the reason why it > is > > > used in minimum_image_size(). > > > > > > According to what you have found out, nr_mapped is no longer reliable in > > > that sense. One problem this causes is that minimum_image_size tends to > be > > > too big. And there may be others. So I think this should be fixed, if > > > possible. > > > > > > Or at least the a/m comment needs to be modified. ;) > > > > After some investigation I have to say, there is no problem of shm, the > > problem is in minimum_image_size. > > > > First, for shm pages, it is marked as file-pages in shmem_mmap, so > > any mmap to the shm memory would increase the number of NR_FILE_MAPPED. > > > > Second, the new allocated shm pages are linked into anon lru(yes, this is a > > little weird, but the fact is that,shm does not want any user to know its > > real file position in the system, users just leverage shmget and shmat to > > use the shm, and they do not need to know the exact place where the shm > > memory is).So, the increment of shm goes to 'NR_ACTIVE_ANON or > > NR_INACTIVE_ANON', as well as NR_SHMEM. > > > > so general speaking, each time you access the mmaped shm region, the number > > of > > NR_FILE_MAPPED++, NR_SHMEM++, (NR_ACTIVE_ANON or NR_INACTIVE_ANON)++ > > > > Third, the increment of NR_SHMEM is not only during accessing mmap, but > also > > in other context, such as fallocate/tmpfs write, they also increase the > > number of NR_SHMEM but without touching other counters. > > > > > > please give a solution:) > > Mission impossible. :) > > But first let me see if I understood your findings correctly (I assumed that > shm and shmem are the same): > > 1) NR_FILE_MAPPED = nr_mapped_filecache_pages + nr_mapped_shmem_pages yes > > This would explain why nr_mapped can be greater than the file cache after > executing drop_caches. > And the comment to NR_FILE_MAPPED in mmzone.h should be changed. (Or is > there a difference between "filecache pages" and "pagecache pages"?) they are the same. > > 2) (NR_ACTIVE_ANON + NR_INACTIVE_ANON) includes nr_mapped_shmem_pages > yes > 3) nr_mapped_shmem_pages <= NR_SHMEM yes > > > If 1) and 2) are true, subtracting NR_FILE_MAPPED as in the original > minimum_image_size() would be perfectly right. And approaches 2) and 3) from > my Comment 57 would be wrong. Although both work where the orig. doesn't. Correct, the original minimum_image_size is doing the right thing.The reason why hibernation fail on your system may be that, the code does not want us to reclaim shmem, maybe it is just a strategy.
(In reply to Chen Yu from comment #66) > (In reply to Jay from comment #65) > > (In reply to Chen Yu from comment #64) > > > (In reply to Jay from comment #62) > > > > (In reply to Chen Yu from comment #61) > > > > > After some code walking, some parts of the mapped pages comes from > shmem and > > > > > ipc/shm, > > > > > and the rest comes from ordinary file page cache, thus > active,inactive file. > > > > > So it is possible that mapped got bigger than active+inactive file, I > wonder > > > > > if there is potential problem that mmaped contains the number of > shmem and > > > > > ipc, I'll take a look then. > > > > > > > > Interesting info, I appreciate your effort! > > > > > > > > The comment to NR_FILE_MAPPED in mmzone.h says > > > > > > > > /* pagecache pages mapped into pagetables. > > > > only modified from process context */ > > > > > > > > I guess this was the original concept for nr_mapped and the reason why > it is > > > > used in minimum_image_size(). > > > > > > > > According to what you have found out, nr_mapped is no longer reliable > in > > > > that sense. One problem this causes is that minimum_image_size tends to > be > > > > too big. And there may be others. So I think this should be fixed, if > > > > possible. > > > > > > > > Or at least the a/m comment needs to be modified. ;) > > > > > > After some investigation I have to say, there is no problem of shm, the > > > problem is in minimum_image_size. > > > > > > First, for shm pages, it is marked as file-pages in shmem_mmap, so > > > any mmap to the shm memory would increase the number of NR_FILE_MAPPED. > > > > > > Second, the new allocated shm pages are linked into anon lru(yes, this is > a > > > little weird, but the fact is that,shm does not want any user to know its > > > real file position in the system, users just leverage shmget and shmat to > > > use the shm, and they do not need to know the exact place where the shm > > > memory is).So, the increment of shm goes to 'NR_ACTIVE_ANON or > > > NR_INACTIVE_ANON', as well as NR_SHMEM. > > > > > > so general speaking, each time you access the mmaped shm region, the > number > > > of > > > NR_FILE_MAPPED++, NR_SHMEM++, (NR_ACTIVE_ANON or NR_INACTIVE_ANON)++ > > > > > > Third, the increment of NR_SHMEM is not only during accessing mmap, but > also > > > in other context, such as fallocate/tmpfs write, they also increase the > > > number of NR_SHMEM but without touching other counters. > > > > > > > > > please give a solution:) > > > > Mission impossible. :) > > > > But first let me see if I understood your findings correctly (I assumed > that > > shm and shmem are the same): > > > > 1) NR_FILE_MAPPED = nr_mapped_filecache_pages + nr_mapped_shmem_pages > yes > > > > This would explain why nr_mapped can be greater than the file cache after > > executing drop_caches. > > And the comment to NR_FILE_MAPPED in mmzone.h should be changed. (Or is > > there a difference between "filecache pages" and "pagecache pages"?) > they are the same. > > > > 2) (NR_ACTIVE_ANON + NR_INACTIVE_ANON) includes nr_mapped_shmem_pages > > > yes > > 3) nr_mapped_shmem_pages <= NR_SHMEM > yes > > > > > > If 1) and 2) are true, subtracting NR_FILE_MAPPED as in the original > > minimum_image_size() would be perfectly right. And approaches 2) and 3) > from > > my Comment 57 would be wrong. Although both work where the orig. doesn't. > Correct, the original minimum_image_size is doing the right thing.The reason > why hibernation fail on your system may be that, the code does not want us > to reclaim > shmem, maybe it is just a strategy. I still think that the high nr_mapped values cause the problem. But after your latest findings I also think that this is limited to VirtualBox. Nevertheless, even if it's just a bug in that sofware, some (kernelwise) provision against nr_mapped going through the roof would be good, imo. Anyway - I have adapted my personal safeguard according to your latest findings. The main difference to the orignal is that shrink_all_memory(saveable - size) is called earlier (which has the additional advantage that phase 2 is often bypassed) and minimum_image_size() is modified to limit nr_mapped to its (theoretically) biggest value (see code below). In case of "normal" nr_mapped-values it works just like the orignal, except for the earlier shrink_all_memory(). In higher load cases the values for minimum_image_size are now almost identical with the original - in case of normal nr_mapped, that is (s. examples below). As far as I am concerned, you may close this bug report. I think we've come to a happy end here, or not? Thank you for your time and effort! Jay *** int hibernate_preallocate_memory(void) ... if (saveable > size) { saveable -= shrink_all_memory(saveable - size); } /* * If the desired number of image pages is at least as large as the * current number of saveable pages in memory, allocate page frames for * the image and we're done. */ if (saveable <= size) { pages = preallocate_image_highmem(save_highmem); pages += preallocate_image_memory(saveable - pages, avail_normal); goto out; } /* Estimate the minimum size of the image. */ pages = minimum_image_size(saveable); ... /* 2016-06-26: modified to keep falsely high nr_mapped values in check */ static unsigned long minimum_image_size(unsigned long saveable) { unsigned long size, nr_mapped, max_nr_mapped; nr_mapped = global_page_state(NR_FILE_MAPPED); max_nr_mapped = global_page_state(NR_ACTIVE_FILE) + global_page_state(NR_INACTIVE_FILE) + global_page_state(NR_SHMEM); if (nr_mapped > max_nr_mapped) nr_mapped = max_nr_mapped; size = global_page_state(NR_SLAB_RECLAIMABLE) + global_page_state(NR_ACTIVE_ANON) + global_page_state(NR_INACTIVE_ANON) + global_page_state(NR_ACTIVE_FILE) + global_page_state(NR_INACTIVE_FILE) - nr_mapped; return saveable <= size ? 0 : saveable - size; } *** Examples Normal nr_mapped: PM: Preallocating image memory... nr_mapped = 202515 pages, 791 MB nr_shmem = 116517 pages, 455 MB active_inactive(file) = 594637 pages, 2322 MB reclaimable_anon_slab = 389834 pages, 1522 MB save_highmem = 0 pages, 0 MB saveable = 2112603 pages, 8252 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061342 pages, 11958 MB count = 3023165 pages, 11809 MB max_size = 1510448 pages, 5900 MB user_specified_image_size = 1346986 pages, 5261 MB adjusted_image_size = 1346987 pages, 5261 MB saveable = 1657565 pages, 6474 MB active_inactive(file) = 145201 pages, 567 MB reclaimable_anon_slab = 371581 pages, 1451 MB nr_mapped = 199885 pages, 780 MB nr_shmem = 51507 pages, 201 MB minimum_pages = 1337491 pages, 5224 MB target_image_size = 1346987 pages, 5261 MB preallocated_high_mem = 0 pages, 0 MB to_alloc = 1512717 pages, 5909 MB to_alloc_adjusted = 1512717 pages, 5909 MB pages_allocated = 1512717 pages, 5909 MB done (allocated 1676178 pages) PM: Allocated 6704712 kbytes in 4.02 seconds (1667.83 MB/s) ... PM: Need to copy 1348817 pages PM: Hibernation image created (1348817 pages copied) minimum_pages, orig.: 5199 MB minimum_pages curr. version: 5224 MB *** High nr_mapped: PM: Preallocating image memory... nr_mapped = 871132 pages, 3402 MB nr_shmem = 95182 pages, 371 MB active_inactive(file) = 980056 pages, 3828 MB reclaimable_anon_slab = 310479 pages, 1212 MB save_highmem = 0 pages, 0 MB saveable = 2415533 pages, 9435 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061315 pages, 11958 MB count = 3023138 pages, 11809 MB max_size = 1510435 pages, 5900 MB user_specified_image_size = 1346986 pages, 5261 MB adjusted_image_size = 1346987 pages, 5261 MB saveable = 1566214 pages, 6118 MB active_inactive(file) = 160351 pages, 626 MB reclaimable_anon_slab = 277972 pages, 1085 MB nr_mapped = 867213 pages, 3387 MB nr_shmem = 49347 pages, 192 MB minimum_pages = 1337589 pages, 5224 MB target_image_size = 1346987 pages, 5261 MB preallocated_high_mem = 0 pages, 0 MB to_alloc = 1512703 pages, 5908 MB to_alloc_adjusted = 1512703 pages, 5908 MB pages_allocated = 1512703 pages, 5908 MB done (allocated 1676151 pages) PM: Allocated 6704604 kbytes in 2.63 seconds (2549.27 MB/s) ... PM: Need to copy 1346249 pages PM: Hibernation image created (1346249 pages copied) minimum_pages, orig.: 7797 MB /* would fail */ minimum_pages curr. vers.: 5224 MB *** Normal nr_mapped, high nr_anon: PM: Preallocating image memory... nr_mapped = 71519 pages, 279 MB nr_shmem = 72891 pages, 284 MB active_inactive(file) = 939631 pages, 3670 MB reclaimable_anon_slab = 1185771 pages, 4631 MB save_highmem = 0 pages, 0 MB saveable = 2242008 pages, 8757 MB highmem = 0 pages, 0 MB additional_pages = 220 pages, 0 MB avail_normal = 3061320 pages, 11958 MB count = 3023143 pages, 11809 MB max_size = 1510437 pages, 5900 MB user_specified_image_size = 1346986 pages, 5261 MB adjusted_image_size = 1346987 pages, 5261 MB saveable = 1371575 pages, 5357 MB active_inactive(file) = 103980 pages, 406 MB reclaimable_anon_slab = 1147489 pages, 4482 MB nr_mapped = 69038 pages, 269 MB nr_shmem = 30205 pages, 117 MB minimum_pages = 189144 pages, 738 MB target_image_size = 1346987 pages, 5261 MB preallocated_high_mem = 0 pages, 0 MB to_alloc = 1512706 pages, 5909 MB to_alloc_adjusted = 1512706 pages, 5909 MB pages_allocated = 1512706 pages, 5909 MB done (allocated 1676156 pages) PM: Allocated 6704624 kbytes in 1.15 seconds (5830.10 MB/s) ... PM: Need to copy 1346554 pages PM: Hibernation image created (1346554 pages copied) minimum_pages, orig.: 735 MB minimum_pages curr. vers.: 738 MB
As your proposal is to deal with insane nr_mapped(may be caused by virtual box driver, not generic shm), close this thread for now and let's sync offline.
No reopening, just FYI: since kernel 4.2 the workaround shown in Comment 25 doesn't work anymore - nr_mapped is high whether the VM is set to "large pages: on" or not. It still works in 4.1.
I think the bug *is* valid and I think I finally know what's behind it. The explanation is simple: What was probably overlooked is that nr_mapped may be counted *twice* in "saveable" - once as NR_FILE_MAPPED and then as contributor to [in]active(file) and [in]active(anon) (s. Comment 64; s.1). If this is right, we subtract too little from saveable when we subtract [in]active(file) and [in]active(anon) *without* the nr_mapped-component (by subtracting nr_mapped). The nr_mapped-component then remains in saveable. What we then get as minimum_image_size is nr_mapped + nr_mapped. This is why the result for minimum_image_size is indeed always roughly twice the size of nr_mapped - which means: 100 % too high. As long as nr_mapped is relatively low this goes unnoticed. But if nr_mapped is high enough, it comes to light. So I think that *not* subtracting NR_FILE_MAPPED in minimum_image_size() is indeed the right thing to do. This simple solution has also proven very reliable in practice (s. Comment 1): static unsigned long minimum_image_size(unsigned long saveable) { unsigned long size; size = global_page_state(NR_SLAB_RECLAIMABLE) + global_page_state(NR_ACTIVE_ANON) + global_page_state(NR_INACTIVE_ANON) + global_page_state(NR_ACTIVE_FILE) + global_page_state(NR_INACTIVE_FILE); /* - global_page_state(NR_FILE_MAPPED); */ /* return saveable <= size ? 0 : saveable - size; */ return saveable - size; } *** 1) "saveable" always amounts to NR_FILE_MAPPED + [in]active(file) + [in]active(anon), give or take a few pages.
I'm reopening this because - the problem is still there - it's not VirtualBox' fault (s. Comment 50) - the cause is identified (s. Comment 70) - the fix is simple and has been tested If you don't agree with my findings or the fix, please briefly explain why. I wouldn't mind if you then closed the bug again as it is my intention to solve this and not to start an endless debate.
OK I'll look at this one this week, but it will take me sometime to pick up the history(I forgot the detail..)
(In reply to Chen Yu from comment #72) > OK I'll look at this one this week, but it will take me sometime to pick up > the history(I forgot the detail..) Thanks. Take your time, it's not urgent.
(In reply to Chen Yu from comment #72) > OK I'll look at this one this week, but it will take me sometime to pick up > the history(I forgot the detail..) Dozed off picking up the history? ;) Understandable. So here's the story in nutshell: - s2disk/s2both may *unnecessarily* fail if NR_FILE_MAPPED is high - rather popular VirtualBox causes such high nr_mapped, vaguely proportionally to the memory that is assigned to the VM, i. e. a Windows-VM with 2 GB system-memory increases nr_mapped by about 2 GB - according to your findings VB does not manipulate memory-counters in an illegal way - the problem is how the minimum-image-size is calculated in minimum_image_size() - I think subtracting nr_mapped from the other counters in m_i_s() is simply an error in reasoning and leads to a mi-size that is roughly twice as high as it should be - this is only veiled by the fact that in most cases nr_mapped is small compared to the other counters - therefore, not subtracting nr_mapped in minimum_image_size() is the way to go So let's come to a decision: either try solve this in the foreseeable future or close this thing now. I could live with both. Because to be frank: I more and more like the idea of being *the only guy on this planet* who enjoys a truly reliable s2disk in Linux, irrespective of what apps are running. :)
The sound of silence was loud enough - so I'm out of here now. I was a bit insisting in this matter because I think it is important - especially since there is no workaround. And I'm afraid that with a shortcoming like this, "Linux on the desktop" is not going to happen. Don't blame it on me. ;)
(In reply to Rainer Fiebig from comment #70) > I think the bug *is* valid and I think I finally know what's behind it. > > The explanation is simple: What was probably overlooked is that nr_mapped > may be counted *twice* in "saveable" - once as NR_FILE_MAPPED and then as > contributor to [in]active(file) and [in]active(anon) (s. Comment 64; s.1). > > If this is right, we subtract too little from saveable when we subtract > [in]active(file) and [in]active(anon) *without* the nr_mapped-component (by > subtracting nr_mapped). The nr_mapped-component then remains in saveable. > What we then get as minimum_image_size is nr_mapped + nr_mapped. This is why > the result for minimum_image_size is indeed always roughly twice the size of > nr_mapped - which means: 100 % too high. > > As long as nr_mapped is relatively low this goes unnoticed. But if nr_mapped > is high enough, it comes to light. > > So I think that *not* subtracting NR_FILE_MAPPED in minimum_image_size() is > indeed the right thing to do. This simple solution has also proven very > reliable in practice (s. Comment 1): > > > static unsigned long minimum_image_size(unsigned long saveable) > { > unsigned long size; > > size = global_page_state(NR_SLAB_RECLAIMABLE) > + global_page_state(NR_ACTIVE_ANON) > + global_page_state(NR_INACTIVE_ANON) > + global_page_state(NR_ACTIVE_FILE) > + global_page_state(NR_INACTIVE_FILE); > /* - global_page_state(NR_FILE_MAPPED); */ > > /* return saveable <= size ? 0 : saveable - size; */ > return saveable - size; > } > > > *** > > 1) "saveable" always amounts to NR_FILE_MAPPED + [in]active(file) + > [in]active(anon), give or take a few pages. If we don't substract the NR_FILE_MAPPED from the size, it means we will get a smaller value of minimum image size, which means, we might over-relaim some pages compared to current code flow?
(In reply to Chen Yu from comment #76) > (In reply to Rainer Fiebig from comment #70) > > I think the bug *is* valid and I think I finally know what's behind it. > > > > The explanation is simple: What was probably overlooked is that nr_mapped > > may be counted *twice* in "saveable" - once as NR_FILE_MAPPED and then as > > contributor to [in]active(file) and [in]active(anon) (s. Comment 64; s.1). > > > > If this is right, we subtract too little from saveable when we subtract > > [in]active(file) and [in]active(anon) *without* the nr_mapped-component (by > > subtracting nr_mapped). The nr_mapped-component then remains in saveable. > > What we then get as minimum_image_size is nr_mapped + nr_mapped. This is > why > > the result for minimum_image_size is indeed always roughly twice the size > of > > nr_mapped - which means: 100 % too high. > > > > As long as nr_mapped is relatively low this goes unnoticed. But if > nr_mapped > > is high enough, it comes to light. > > > > So I think that *not* subtracting NR_FILE_MAPPED in minimum_image_size() is > > indeed the right thing to do. This simple solution has also proven very > > reliable in practice (s. Comment 1): > > > > > > static unsigned long minimum_image_size(unsigned long saveable) > > { > > unsigned long size; > > > > size = global_page_state(NR_SLAB_RECLAIMABLE) > > + global_page_state(NR_ACTIVE_ANON) > > + global_page_state(NR_INACTIVE_ANON) > > + global_page_state(NR_ACTIVE_FILE) > > + global_page_state(NR_INACTIVE_FILE); > > /* - global_page_state(NR_FILE_MAPPED); */ > > > > /* return saveable <= size ? 0 : saveable - size; */ > > return saveable - size; > > } > > > > > > *** > > > > 1) "saveable" always amounts to NR_FILE_MAPPED + [in]active(file) + > > [in]active(anon), give or take a few pages. > If we don't substract the NR_FILE_MAPPED from the size, it means we will > get a smaller value of minimum image size, which means, we might over-relaim > some pages compared to current code flow? Yes, minimum_image_size() would return a smaller value - that is the intended effect to prevent unnecessary failure. And you're right, we might indeed at times over-reclaim. Like in this high-load case: PM: Preallocating image memory... ... saveable = 3011927 pages, 11765 MB nr_mapped = 1479297 pages, 5778 MB active_inactive(file) = 1012822 pages, 3956 MB nr_sreclaimable = 33485 pages, 130 MB active_inactive(anon) = 555363 pages, 2169 MB ... minimum_pages = 1410257 pages, 5508 MB /* 11288 MB and failure with orig. */ nr_mapped = 1477778 pages, 5772 MB /* After shrink_all_memory() */ ... PM: Hibernation image created (1416711 pages copied) The value returned by minimum_image_size() is < nr_mapped. This makes avail_normal appear greater than it actually is: ... if (avail_normal > pages) avail_normal -= pages; ... I'm still trying to figure out the consequences, if instead of the first block of the following sequence the else-block would be entered erroneously due to a too optimistic value for avail_normal: ... pages = preallocate_image_memory(alloc, avail_normal); if (pages < alloc) { ...leads to failure } else { ...success } Would s2disk still succeed? Or would it simply fail and we get back to the desktop? Or would the system crash? So far I haven't had a crash, only successes or (on intentional overload) failure and return to the desktop. What's the expert's view?
(In reply to Rainer Fiebig from comment #77) snip > > I'm still trying to figure out the consequences, if instead of the first > block of the following sequence the else-block would be entered erroneously > due to a too optimistic value for avail_normal: > > ... > pages = preallocate_image_memory(alloc, avail_normal); > if (pages < alloc) { > ...leads to failure > } else { > ...success > } > > Would s2disk still succeed? Or would it simply fail and we get back to the > desktop? Or would the system crash? > So far I haven't had a crash, only successes or (on intentional overload) > failure and return to the desktop. > > What's the expert's view? Took a closer look. And I think a falsely too high avail_normal is not a problem (a falsely too low is): If avail_normal is greater than alloc (falsely or not), preallocate_image_memory() returns either "alloc" or less, depending on how many free pages can be found by preallocate_image_pages(). So it's either success or failure. But no crash. Agrees with what I've been seeing here. However, if one doesn't like the notion that minimum_pages can at times be less than nr_mapped, one could do something like this: /* Estimate the minimum size of the image. */ pages = minimum_image_size(saveable); if(pages < global_node_page_state(NR_FILE_MAPPED)) pages = global_node_page_state(NR_FILE_MAPPED);
Created attachment 255931 [details] Proposed fix for failure of s2disk in case of high NR_FILE_MAPPED Make a copy of the original snapshot.c Place the patch in the directory above the kernel-sources. Open a terminal there, move into the sources-dir. (e.g.: cd linux-4.9.20). Then test by entering patch -p1 < ../patch-s2disk_without_debug_k4920_or_later --dry-run If no warnings or error-messages, patch by entering patch -p1 < ../patch-s2disk_without_debug_k4920_or_later Compare your snapshot.orig to the patched snapshot.c, to see what has changed. If that's OK with you, recompile/install the kernel. If you later want to update the kernel-sources, you first have to remove the patch: patch -R -p1 < ../patch-s2disk_without_debug_k4920_or_later
This on/off communication is unproductive, so let's come to an end now. As a final contribution I have attached a patch for kernel 4.9.20 or later(*). Perhaps for testing by experienced fellow sufferers. Provided with good intent but without guarantees. For me, one question was left to be answered: why did s2isk fail in a case with an overflow in minimum_image_size() (https://bugzilla.kernel.org/show_bug.cgi?id=47931#c63) - while overflow always led to success on my system? In hindsight the answer is pretty simple: in the said case memory was *really* used up: ... [ 110.091830] to_alloc_adjusted=495253 pages, 1934MB /* wanted */ [ 111.824189] pages_allocated=474182 pages, 1852MB /* actually allocated */ ... But in case of high nr_mapped more often than not memory only *seems* to be used up - due to the wrong calculation of mis(). And in those cases an overflow in mis() can actually be *necessary* for s2disk to succeed (see 3) and 4) below). That's all. Thanks for your time and effort! --- Summary of situation with the original code, total memory of 12 GB: 1) s2disk succeeds in this case: saveable = 1815966 pages, 7093 MB nr_mapped = 702215 pages, 2743 MB active_inactive(file) = 660676 pages, 2580 MB nr_sreclaimable = 29769 pages, 116 MB active_inactive(anon) = 456063 pages, 1781 MB 2) Fails in this: saveable = 2033746 pages, 7944 MB nr_mapped = 912466 pages, 3564 MB active_inactive(file) = 673444 pages, 2630 MB nr_sreclaimable = 26927 pages, 105 MB active_inactive(anon) = 475323 pages, 1856 MB 3) May succeed if 2) is followed immediately by a second s2disk. 4) Succeeds at the *first* attempt in a case where memory-usage is even *higher* than in case 2): saveable = 2362834 pages, 9229 MB nr_mapped = 1423210 pages, 5559 MB active_inactive(file) = 598559 pages, 2338 MB nr_sreclaimable = 26542 pages, 103 MB active_inactive(anon) = 324172 pages, 1266 MB Who wouldn't be confused by that? --- *) Instructions: see comment to patch
Thanks for your effort, previously I was a little busy with some critical internal bugs, but I promise to return to this one once they are done.
To give a little feedback for you: I have Jay Rainer Fiebig's most recent patches from BUG 47931 and from here in use for 1 1/4 days now at kernel 4.11.7, reaching a kind of milestone with 10 successful ==resuming hibernations atm. Normally with in-kernel hibernation problems begin after attempt No.3 with segfaults in random processes. Most times they affect needed KDE subprocesses by max. No.10, often earlier, requiring a reboot. Also negative: the resume speed until the desktop incl. firefox gets interactive again. This time, patched, I'm astonished by resume speed and reliability. Approx. 2.5min to resume to an all-responsive desktop state is very good and near to former TuxOnIce timings. So far only one segfault (FF's plugin-container, meaning flash-plugin). And only one really slow desktop recovery at No.9 -- where I had filled the /dev/shm ramdisk to ~1/2 (but not before and later). Main problem with No.9 was heavy swapping. My most relevant setup info: Notebook with Core2duo CPU, 8G RAM, integrated GFX (upto 2G shared), 13G swap on 2nd internal standard HDD, 3G /dev/shm as ramdisk (often in use). The testings were done under my normal use patterns, with having changed /dev/shm and firefox loads. I'd keep up testing til milestone 20(?) if all goes well :-)
New findings, as more testing time went by: * removed "kmozillahelper": This package is most likely to have infected RAM and parent- and child-processes of Firefox, involving KDE, leading to page/ protection faults, and invoking nasty coredumps via systemd-coredump. (Only, as these coredumps likely occurred after resumes from s2disk, I've mentioned it on here. In fact the fautls were unpredictable.) * I've also thoroughly tested the patches from Jay/ Rainer Fiebig from BUG 47931, https://bugzilla.kernel.org/attachment.cgi?id=152151 vs. https://bugzilla.kernel.org/attachment.cgi?id=155521, and have the experience, that the former patch makes resume-timings shorter and more predictable. In corner cases the second patch can leave the KDE & Firefox unresponsive for over ~5min, means: reboot & restarting applications instead would be faster. The first patch, on the other hand, always remains between 1 - 2.5min resume time, until desktop and FF get responsive. And that's fine. And BTW, I'm now at 4d 2h, 17th resume, with this kernel. Only changes made to the FF setup.
I've forgotton to add another comparison: In normal -- unpatched -- mainline 4.11.8 -- KDE & Firefox can remain unresponsive for more than 7mins in corner cases.
(In reply to Manuel Krause from comment #82) > To give a little feedback for you: > I have Jay Rainer Fiebig's most recent patches from BUG 47931 and from here > in use for 1 1/4 days now at kernel 4.11.7, reaching a kind of milestone > with 10 successful ==resuming hibernations atm. > Normally with in-kernel hibernation problems begin after attempt No.3 with > segfaults in random processes. Most times they affect needed KDE > subprocesses by max. No.10, often earlier, requiring a reboot. Also > negative: the resume speed until the desktop incl. firefox gets interactive > again. > > This time, patched, I'm astonished by resume speed and reliability. Approx. > 2.5min to resume to an all-responsive desktop state is very good and near to > former TuxOnIce timings. So far only one segfault (FF's plugin-container, > meaning flash-plugin). And only one really slow desktop recovery at No.9 -- > where I had filled the /dev/shm ramdisk to ~1/2 (but not before and later). > Main problem with No.9 was heavy swapping. > > My most relevant setup info: Notebook with Core2duo CPU, 8G RAM, integrated > GFX (upto 2G shared), 13G swap on 2nd internal standard HDD, 3G /dev/shm as > ramdisk (often in use). > The testings were done under my normal use patterns, with having changed > /dev/shm and firefox loads. > > I'd keep up testing til milestone 20(?) if all goes well :-) It's not a good idea to mix patches. If your problems are caused by high NR_FILE_MAPPED, the patch provided here is the only one you need. Otherwise stick with the original code and a long-term-kernel (perhaps 4.9) and try to track down the real issue (graphics?) - for instance by systematically excluding possible troublemakers while keeping everything else unchanged. BTW: 10 successful hibernations may be a "milestone" for a polar bear. But for a Linux-system? ;)
On grounds of theoretical musings (see Comment 70), lack of convincing counter-arguments and the practical evidence of now thousands of successful s2disks I consider this solved. In case minimum_image_size() was deliberately designed to return a "generous" value in order to limit pressure on the normal zone in 32bit-machines (and get additional pages from highmem), a conditional compilation could take care of this. So long!
Fixed in 4.16