Bug 11381

Summary: Configurability of shmmax in containers
Product: Other Reporter: Peter Eisentraut (peter_e)
Component: OtherAssignee: other_other
Status: CLOSED OBSOLETE    
Severity: enhancement CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26.2 Subsystem:
Regression: No Bisected commit-id:

Description Peter Eisentraut 2008-08-20 05:58:57 UTC
I would like to request that the default shmmax setting be increased or the downsides of that be documented.  Allow me to explain.

I am with the PostgreSQL development team.  PostgreSQL is probably one of the few users of large amounts of SysV shared memory.  Users would usually want to configure anywhere between 10% and 50% of their physical RAM to be used as shared memory, which would translate to something on the order of gigabytes nowadays.  One of the uniformly annoying things about setting this up is that you need to reconfigure the Linux kernel to allow that.  sysctl is nice and all, but it still requires users to learn about operating system and kernel details, requires root access, and distros don't handle sysctl uniformly either.  Maybe there is even a good reason for that, but I couldn't find it, and at least I would like to learn it, so that we can pass that information on to our users.

I did some kernel version archeology and found out that up until kernels 2.2 the shmmax setting appears to have been restricted by CPU-specific constraints, as indicated by the default setting being different across CPUs and being defined in an asm header.  The default setting on i386 was increased from 16 MB to 32 MB somewhere around 1998 in the kernel 2.0 line, and it remains at 32 MB in the latest kernel on all architectures.

Now one question is whether there is a space or time overhead involved with setting a high shmmax limit that isn't actually used.  If so, it would be interesting to know what that overhead is.  The feeling I get from browsing the kernel source code over time is that there was some management overhead and/or some restrictions about this in old kernels, but that nowadays it doesn't really seem to matter much anymore.  I suspect instead that this whole thing was just forgotten, because few applications use large amounts of shared memory.

So, if you want to do us a favor, could you please see about increasing the default shmmax setting to whatever the theoretical maximum is?
Comment 1 Anonymous Emailer 2008-08-20 12:01:50 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 20 Aug 2008 05:58:57 -0700 (PDT)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11381
> 
>            Summary: default shmmax
>            Product: Other
>            Version: 2.5
>      KernelVersion: 2.6.26.2
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: enhancement
>           Priority: P1
>          Component: Other
>         AssignedTo: other_other@kernel-bugs.osdl.org
>         ReportedBy: peter_e@gmx.net
> 
> 
> I would like to request that the default shmmax setting be increased or the
> downsides of that be documented.  Allow me to explain.
> 
> I am with the PostgreSQL development team.  PostgreSQL is probably one of the
> few users of large amounts of SysV shared memory.  Users would usually want
> to
> configure anywhere between 10% and 50% of their physical RAM to be used as
> shared memory, which would translate to something on the order of gigabytes
> nowadays.  One of the uniformly annoying things about setting this up is that
> you need to reconfigure the Linux kernel to allow that.  sysctl is nice and
> all, but it still requires users to learn about operating system and kernel
> details, requires root access, and distros don't handle sysctl uniformly
> either.  Maybe there is even a good reason for that, but I couldn't find it,
> and at least I would like to learn it, so that we can pass that information
> on
> to our users.
> 
> I did some kernel version archeology and found out that up until kernels 2.2
> the shmmax setting appears to have been restricted by CPU-specific
> constraints,
> as indicated by the default setting being different across CPUs and being
> defined in an asm header.  The default setting on i386 was increased from 16
> MB
> to 32 MB somewhere around 1998 in the kernel 2.0 line, and it remains at 32
> MB
> in the latest kernel on all architectures.
> 
> Now one question is whether there is a space or time overhead involved with
> setting a high shmmax limit that isn't actually used.  If so, it would be
> interesting to know what that overhead is.  The feeling I get from browsing
> the
> kernel source code over time is that there was some management overhead
> and/or
> some restrictions about this in old kernels, but that nowadays it doesn't
> really seem to matter much anymore.  I suspect instead that this whole thing
> was just forgotten, because few applications use large amounts of shared
> memory.
> 
> So, if you want to do us a favor, could you please see about increasing the
> default shmmax setting to whatever the theoretical maximum is?
> 

I don't think anybody has even thought about the shmmax default in
years.  Sure, it might be time to reexamine that.

It would be useful to get distro input on this.  Do they override the
kernel default at boot time?  If so, what do they do?


Also, from a quick read it looks to me that shmmax is busted in the
non-init namespace.

clone_ipc_ns() calls shm_init_ns() which does

	ns->shm_ctlmax = SHMMAX;

which a) fails to inherit the parent's setting and b) cannot be altered
from SHMMAX via the sysctl?
Comment 2 Alexey Dobriyan 2008-08-20 12:12:09 UTC
On Wed, Aug 20, 2008 at 12:00:43PM -0700, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Wed, 20 Aug 2008 05:58:57 -0700 (PDT)
> bugme-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=11381
> > 
> >            Summary: default shmmax
> >            Product: Other
> >            Version: 2.5
> >      KernelVersion: 2.6.26.2
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: enhancement
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: other_other@kernel-bugs.osdl.org
> >         ReportedBy: peter_e@gmx.net
> > 
> > 
> > I would like to request that the default shmmax setting be increased or the
> > downsides of that be documented.  Allow me to explain.
> > 
> > I am with the PostgreSQL development team.  PostgreSQL is probably one of
> the
> > few users of large amounts of SysV shared memory.  Users would usually want
> to
> > configure anywhere between 10% and 50% of their physical RAM to be used as
> > shared memory, which would translate to something on the order of gigabytes
> > nowadays.  One of the uniformly annoying things about setting this up is
> that
> > you need to reconfigure the Linux kernel to allow that.  sysctl is nice and
> > all, but it still requires users to learn about operating system and kernel
> > details, requires root access, and distros don't handle sysctl uniformly
> > either.  Maybe there is even a good reason for that, but I couldn't find
> it,
> > and at least I would like to learn it, so that we can pass that information
> on
> > to our users.
> > 
> > I did some kernel version archeology and found out that up until kernels
> 2.2
> > the shmmax setting appears to have been restricted by CPU-specific
> constraints,
> > as indicated by the default setting being different across CPUs and being
> > defined in an asm header.  The default setting on i386 was increased from
> 16 MB
> > to 32 MB somewhere around 1998 in the kernel 2.0 line, and it remains at 32
> MB
> > in the latest kernel on all architectures.
> > 
> > Now one question is whether there is a space or time overhead involved with
> > setting a high shmmax limit that isn't actually used.  If so, it would be
> > interesting to know what that overhead is.  The feeling I get from browsing
> the
> > kernel source code over time is that there was some management overhead
> and/or
> > some restrictions about this in old kernels, but that nowadays it doesn't
> > really seem to matter much anymore.  I suspect instead that this whole
> thing
> > was just forgotten, because few applications use large amounts of shared
> > memory.
> > 
> > So, if you want to do us a favor, could you please see about increasing the
> > default shmmax setting to whatever the theoretical maximum is?
> > 
> 
> I don't think anybody has even thought about the shmmax default in
> years.  Sure, it might be time to reexamine that.
> 
> It would be useful to get distro input on this.  Do they override the
> kernel default at boot time?  If so, what do they do?
> 
> 
> Also, from a quick read it looks to me that shmmax is busted in the
> non-init namespace.
> 
> clone_ipc_ns() calls shm_init_ns() which does
> 
>       ns->shm_ctlmax = SHMMAX;
> 
> which a) fails to inherit the parent's setting and

This is debatable if such behaviour should be default, this makes one ipc_ns
more ipc_ns than others.

> b) cannot be altered from SHMMAX via the sysctl?

It can be altered. See kludge called get_ipc().

(well, I haven't checked personally, but it's a bug if IPC sysctls
aren't independently controllable after CLONE_NEWIPC, complain loudly).
Comment 3 Alan 2008-08-20 12:12:49 UTC
> It would be useful to get distro input on this.  Do they override the
> kernel default at boot time?  If so, what do they do?

Red Hat provide a sysctl tuning config file and I believe things like the
Oracle install docs cover this.

There is btw no earthly reason why a postgres package can't include a
tool to do this or postgres can't check and update it as part of its
own set up and config file options


Alan
Comment 4 Alexey Dobriyan 2008-08-20 12:16:10 UTC
On Wed, Aug 20, 2008 at 11:12:57PM +0400,  wrote:
> On Wed, Aug 20, 2008 at 12:00:43PM -0700, Andrew Morton wrote:
> > 
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> > 
> > On Wed, 20 Aug 2008 05:58:57 -0700 (PDT)
> > bugme-daemon@bugzilla.kernel.org wrote:
> > 
> > > http://bugzilla.kernel.org/show_bug.cgi?id=11381
> > > 
> > >            Summary: default shmmax
> > >            Product: Other
> > >            Version: 2.5
> > >      KernelVersion: 2.6.26.2
> > >           Platform: All
> > >         OS/Version: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: enhancement
> > >           Priority: P1
> > >          Component: Other
> > >         AssignedTo: other_other@kernel-bugs.osdl.org
> > >         ReportedBy: peter_e@gmx.net
> > > 
> > > 
> > > I would like to request that the default shmmax setting be increased or
> the
> > > downsides of that be documented.  Allow me to explain.
> > > 
> > > I am with the PostgreSQL development team.  PostgreSQL is probably one of
> the
> > > few users of large amounts of SysV shared memory.  Users would usually
> want to
> > > configure anywhere between 10% and 50% of their physical RAM to be used
> as
> > > shared memory, which would translate to something on the order of
> gigabytes
> > > nowadays.  One of the uniformly annoying things about setting this up is
> that
> > > you need to reconfigure the Linux kernel to allow that.  sysctl is nice
> and
> > > all, but it still requires users to learn about operating system and
> kernel
> > > details, requires root access, and distros don't handle sysctl uniformly
> > > either.  Maybe there is even a good reason for that, but I couldn't find
> it,
> > > and at least I would like to learn it, so that we can pass that
> information on
> > > to our users.
> > > 
> > > I did some kernel version archeology and found out that up until kernels
> 2.2
> > > the shmmax setting appears to have been restricted by CPU-specific
> constraints,
> > > as indicated by the default setting being different across CPUs and being
> > > defined in an asm header.  The default setting on i386 was increased from
> 16 MB
> > > to 32 MB somewhere around 1998 in the kernel 2.0 line, and it remains at
> 32 MB
> > > in the latest kernel on all architectures.
> > > 
> > > Now one question is whether there is a space or time overhead involved
> with
> > > setting a high shmmax limit that isn't actually used.  If so, it would be
> > > interesting to know what that overhead is.  The feeling I get from
> browsing the
> > > kernel source code over time is that there was some management overhead
> and/or
> > > some restrictions about this in old kernels, but that nowadays it doesn't
> > > really seem to matter much anymore.  I suspect instead that this whole
> thing
> > > was just forgotten, because few applications use large amounts of shared
> > > memory.
> > > 
> > > So, if you want to do us a favor, could you please see about increasing
> the
> > > default shmmax setting to whatever the theoretical maximum is?
> > > 
> > 
> > I don't think anybody has even thought about the shmmax default in
> > years.  Sure, it might be time to reexamine that.
> > 
> > It would be useful to get distro input on this.  Do they override the
> > kernel default at boot time?  If so, what do they do?
> > 
> > 
> > Also, from a quick read it looks to me that shmmax is busted in the
> > non-init namespace.
> > 
> > clone_ipc_ns() calls shm_init_ns() which does
> > 
> >     ns->shm_ctlmax = SHMMAX;
> > 
> > which a) fails to inherit the parent's setting and
> 
> This is debatable if such behaviour should be default, this makes one ipc_ns
> more ipc_ns than others.

Oh, I forgot for a moment, that mainline has hierarchical ipc_ns.
Comment 5 Solofo Ramangalahy 2008-08-21 00:59:05 UTC
>>>>> On Wed, 20 Aug 2008 12:00:43 -0700, Andrew Morton
>>>>> <akpm@linux-foundation.org> said:
    Andrew>I don't think anybody has even thought about the shmmax default in
    Andrew> years.

IIRC, Matt (explicit CC: added) did a review of IPC tunables and how
they are changed in the distributions.

    Andrew> It would be useful to get distro input on this.  Do they
    Andrew> override the kernel default at boot time?  If so, what do
    Andrew> they do?

OpenSuse (11.0 at least) has this:
-#define SHMMAX 0x2000000                /* max shared seg size (bytes) */
+#define SHMMAX ULONG_MAX                /* max shared seg size (bytes) */
Comment 6 Alan 2008-09-22 09:50:19 UTC
No consensus on changing the defaults and distros have tools for doing so. The requirement is that SHMMAX is sane for the user by default and that means *safe* rather than as big as possible

Not closing as #4 needs checking and if need be fixing before closure
Comment 7 Greg Stark 2008-10-14 18:18:00 UTC
Using distribution tools seems kind of backwards, Postgres is packaged by most distributions -- why would a distribution ship with one value but have some packages silently change that value behind the admin's back? Besides, 32M isn't "safe" anyways. I have several machines around with less memory than that.

This seems like defining policy rather than mechanism. We don't have a maximum file size even though allocating a large file can run you out of disk space. Or a hard system-wide maximum process time. We have facilities for limiting these resources but the kernel comes prepared for the worst and provides the admin facilities for limiting his or her users if he wants.

Keep in mind you're talking about a limit which applies even to the amount of memory a *privileged* process can access. mlock() allows programs to do the same thing and imposes no limits on privileged programs (and a soft limit on unprivileged programs which a setuid program could override locally without changing a global system setting).

I have a feeling the right answer is to junk the system-wide SHMMAX and impose the a more usable per-process or per-user limit like the one mlock() uses anyways. There's no fixed system-wide limit which will serve any useful purpose for most people.
Comment 8 Francois Cartegnie 2009-02-22 08:05:45 UTC
I ran into the same problem. I'm working on a solution (still to figure out how to deal with namespaces).
I'm migrating SHM (maybe all ipc limits) system global limits to the rlimit ones.
This would enable inheritance and to be tuned by user processes.
Besides hard limits can still be set higher by privilegied users.
This would also allow setting custom limits per account through the pam_limit extension.
Comment 9 Alan 2012-05-22 13:32:52 UTC
Closing as obsolete, if this is incorrect please reopen against a recent kernel