Bug 38032 - default values of /proc/sys/net/ipv4/udp_mem are incorrect and can lead to hung system
Summary: default values of /proc/sys/net/ipv4/udp_mem are incorrect and can lead to hu...
Status: RESOLVED CODE_FIX
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-06-21 00:35 UTC by starlight
Modified: 2012-08-24 14:54 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description starlight 2011-06-21 00:35:20 UTC
In the RHEL 5.5 back-port of this tunable we ran into trouble locking up systems because the boot-time default is set based on physical memory does not account for the hugepages= in the boot parameters.  So the UDP socket buffer limit can exceed phyisical memory.  Don't know if this is an issue in mainline kernels but it seems likely so reporting this as a courtsey.  Seems like it would be easy to fix the default to account for the memory reserved by hugepages which is not available for slab allocations.

https://bugzilla.redhat.com/show_bug.cgi?id=714833
Comment 1 starlight 2011-06-21 13:29:51 UTC
Problem is worse than it looks.

Turns out that differential 'slab' memory consumption per /proc/meminfo is double the apparent socket memory consumption as reported by

netstat -nau | awk '/^udp/ {t+= $2} END {print t}'

so even without hugepages= in the mix, the default 'udp_mem' limit can easily lead to a frozen system.  The commit limit reported by /proc/meminfo appears to be a good starting point for setting 'udp_mem'.  Now taking this value, subtracting 1GB from it, dividing this by two and using that for 'udp_mem'.  Of course one has to divide by 4096 for the final value.  The "min" and "pressure" settings appear to have no obvious effect.  Setting them both to 7/8 of the max just so something rational appears in these buckets.
Comment 2 Andrew Morton 2011-07-06 23:03:45 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

(cc's added)

On Tue, 21 Jun 2011 00:35:22 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=38032
> 
>            Summary: default values of /proc/sys/net/ipv4/udp_mem does not
>                     consider huge page allocatio
>            Product: Memory Management
>            Version: 2.5
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: akpm@linux-foundation.org
>         ReportedBy: starlight@binnacle.cx
>         Regression: No
> 
> 
> In the RHEL 5.5 back-port of this tunable we ran into trouble locking up
> systems because the boot-time default is set based on physical memory does
> not
> account for the hugepages= in the boot parameters.  So the UDP socket buffer
> limit can exceed phyisical memory.  Don't know if this is an issue in
> mainline
> kernels but it seems likely so reporting this as a courtsey.  Seems like it
> would be easy to fix the default to account for the memory reserved by
> hugepages which is not available for slab allocations.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=714833
> 

Yes, we've made similar mistakes in other places.

I don't think we really have an official formula for what callers
should be doing here.  net/ipv4/udp.c:udp_init() does

        nr_pages = totalram_pages - totalhigh_pages;                            

which assumes that totalram_pages does not include the pages which were
lost to hugepage allocations.

I *think* that this is now the case, but it wasn't always the case - we
made relatively recent fixes to the totalram_pages maintenance.

Perhaps UDP should be using the misnamed nr_free_buffer_pages() here.
Comment 3 starlight 2011-07-07 02:42:41 UTC
For anyone who may not have read the bugzilla, a
possibly larger concern subsequently discovered is
that actual kernel memory consumption is double the
total of the values reported by 'netstat -nau', at
least when mostly small packets are received and
a RHEL 5 kernel is in use.  The tunable enforces based
on the 'netstat' value rather than the actual value
in the RH kernel.  Maybe not an issue in the
mainline, but it took a few additional system
hangs in the lab before we figured this out
and divided the 'udm_mem' maximum value in half.


At 04:03 PM 7/6/2011 -0700, Andrew Morton wrote:
>
>(switched to email.  Please respond via emailed reply-to-all, 
>not via the bugzilla web interface).
>
>(cc's added)
>
>On Tue, 21 Jun 2011 00:35:22 GMT
>bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=38032
>> 
>>            Summary: default values of 
>/proc/sys/net/ipv4/udp_mem does not
>>                     consider huge page allocatio
>>            Product: Memory Management
>>            Version: 2.5
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Other
>>         AssignedTo: akpm@linux-foundation.org
>>         ReportedBy: starlight@binnacle.cx
>>         Regression: No
>> 
>> 
>> In the RHEL 5.5 back-port of this tunable we ran into trouble locking up
>> systems because the boot-time default is set based on physical memory does
>> not
>> account for the hugepages= in the boot parameters.  So the UDP socket buffer
>> limit can exceed phyisical memory.  Don't know if this is an issue in
>> mainline
>> kernels but it seems likely so reporting this as a courtesy.  Seems like it
>> would be easy to fix the default to account for the memory reserved by
>> hugepages which is not available for slab allocations.
>> 
>> https://bugzilla.redhat.com/show_bug.cgi?id=714833
>> 
>
>Yes, we've made similar mistakes in other places.
>
>I don't think we really have an official formula for what callers
>should be doing here.  net/ipv4/udp.c:udp_init() does
>
>        nr_pages = totalram_pages - totalhigh_pages;             
>               
>
>which assumes that totalram_pages does not include the pages which were
>lost to hugepage allocations.
>
>I *think* that this is now the case, but it wasn't always the case - we
>made relatively recent fixes to the totalram_pages maintenance.
>
>Perhaps UDP should be using the misnamed nr_free_buffer_pages() 
>here.
Comment 4 Eric Dumazet 2011-07-07 03:59:44 UTC
Le mercredi 06 juillet 2011 à 21:31 -0400, starlight@binnacle.cx a
écrit :
> For anyone who may not have read the bugzilla, a
> possibly larger concern subsequently discovered is
> that actual kernel memory consumption is double the
> total of the values reported by 'netstat -nau', at
> least when mostly small packets are received and
> a RHEL 5 kernel is in use.  The tunable enforces based
> on the 'netstat' value rather than the actual value
> in the RH kernel.  Maybe not an issue in the
> mainline, but it took a few additional system
> hangs in the lab before we figured this out
> and divided the 'udm_mem' maximum value in half.
> 

Several problems here

1) Hugepages can be setup after system boot, and udp_mem/tcp_mem not
updated accordingly.

2) Using SLUB debug or kmemcheck for instance adds lot of overhead, that
we dont take into account (it would probably be expensive to do so).
Even ksize(ptr) is not able to really report memory usage of an object.

3) What happens if both tcp and udp sockets are in use on the system,
shouls we half both udp_mem and tcp_mem just in case ?

Now if you also use SCTP sockets, UDP-Lite sockets, lot of file
mappings, huge pages, conntracking, posix timers (currently not
limited), threads, ..., what happens ? Should we then set udp_mem to 1%
of current limit just in case ?

What about fixing the real problem instead ?

When you say the system freezes, is it in the UDP stack, or elsewhere ?
Comment 5 Eric Dumazet 2011-07-07 04:38:23 UTC
Lets use following patch ?

[PATCH] net: refine {udp|tcp|sctp}_mem limits

Current tcp/udp/sctp global memory limits are not taking into account
hugepages allocations, and allow 50% of ram to be used by buffers of a
single protocol [ not counting space used by sockets / inodes ...]

Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram
per protocol, and a minimum of 128 pages.
Heavy duty machines sysadmins probably need to tweak limits anyway.


References: https://bugzilla.stlinux.com/show_bug.cgi?id=38032
Reported-by: starlight <starlight@binnacle.cx>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/tcp.c      |   10 ++--------
 net/ipv4/udp.c      |   10 ++--------
 net/sctp/protocol.c |   11 +----------
 3 files changed, 5 insertions(+), 26 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 054a59d..46febca 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3220,7 +3220,7 @@ __setup("thash_entries=", set_thash_entries);
 void __init tcp_init(void)
 {
 	struct sk_buff *skb = NULL;
-	unsigned long nr_pages, limit;
+	unsigned long limit;
 	int i, max_share, cnt;
 	unsigned long jiffy = jiffies;
 
@@ -3277,13 +3277,7 @@ void __init tcp_init(void)
 	sysctl_tcp_max_orphans = cnt / 2;
 	sysctl_max_syn_backlog = max(128, cnt / 256);
 
-	/* Set the pressure threshold to be a fraction of global memory that
-	 * is up to 1/2 at 256 MB, decreasing toward zero with the amount of
-	 * memory, with a floor of 128 pages.
-	 */
-	nr_pages = totalram_pages - totalhigh_pages;
-	limit = min(nr_pages, 1UL<<(28-PAGE_SHIFT)) >> (20-PAGE_SHIFT);
-	limit = (limit * (nr_pages >> (20-PAGE_SHIFT))) >> (PAGE_SHIFT-11);
+	limit = nr_free_buffer_pages() / 8;
 	limit = max(limit, 128UL);
 	sysctl_tcp_mem[0] = limit / 4 * 3;
 	sysctl_tcp_mem[1] = limit;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 48cd88e..198f75b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2209,16 +2209,10 @@ void __init udp_table_init(struct udp_table *table, const char *name)
 
 void __init udp_init(void)
 {
-	unsigned long nr_pages, limit;
+	unsigned long limit;
 
 	udp_table_init(&udp_table, "UDP");
-	/* Set the pressure threshold up by the same strategy of TCP. It is a
-	 * fraction of global memory that is up to 1/2 at 256 MB, decreasing
-	 * toward zero with the amount of memory, with a floor of 128 pages.
-	 */
-	nr_pages = totalram_pages - totalhigh_pages;
-	limit = min(nr_pages, 1UL<<(28-PAGE_SHIFT)) >> (20-PAGE_SHIFT);
-	limit = (limit * (nr_pages >> (20-PAGE_SHIFT))) >> (PAGE_SHIFT-11);
+	limit = nr_free_buffer_pages() / 8;
 	limit = max(limit, 128UL);
 	sysctl_udp_mem[0] = limit / 4 * 3;
 	sysctl_udp_mem[1] = limit;
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 67380a2..207175b 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1058,7 +1058,6 @@ SCTP_STATIC __init int sctp_init(void)
 	int status = -EINVAL;
 	unsigned long goal;
 	unsigned long limit;
-	unsigned long nr_pages;
 	int max_share;
 	int order;
 
@@ -1148,15 +1147,7 @@ SCTP_STATIC __init int sctp_init(void)
 	/* Initialize handle used for association ids. */
 	idr_init(&sctp_assocs_id);
 
-	/* Set the pressure threshold to be a fraction of global memory that
-	 * is up to 1/2 at 256 MB, decreasing toward zero with the amount of
-	 * memory, with a floor of 128 pages.
-	 * Note this initializes the data in sctpv6_prot too
-	 * Unabashedly stolen from tcp_init
-	 */
-	nr_pages = totalram_pages - totalhigh_pages;
-	limit = min(nr_pages, 1UL<<(28-PAGE_SHIFT)) >> (20-PAGE_SHIFT);
-	limit = (limit * (nr_pages >> (20-PAGE_SHIFT))) >> (PAGE_SHIFT-11);
+	limit = nr_free_buffer_pages() / 8;
 	limit = max(limit, 128UL);
 	sysctl_sctp_mem[0] = limit / 4 * 3;
 	sysctl_sctp_mem[1] = limit;
Comment 6 David S. Miller 2011-07-07 07:28:32 UTC
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 07 Jul 2011 06:38:10 +0200

> [PATCH] net: refine {udp|tcp|sctp}_mem limits
> 
> Current tcp/udp/sctp global memory limits are not taking into account
> hugepages allocations, and allow 50% of ram to be used by buffers of a
> single protocol [ not counting space used by sockets / inodes ...]
> 
> Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram
> per protocol, and a minimum of 128 pages.
> Heavy duty machines sysadmins probably need to tweak limits anyway.
> 
> 
> References: https://bugzilla.stlinux.com/show_bug.cgi?id=38032
> Reported-by: starlight <starlight@binnacle.cx>
> Suggested-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.
Comment 7 Alan 2012-08-24 14:54:25 UTC
*** Bug 43048 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.