Distribution: Any Hardware Environment: x86 and other 32-bit platforms Software Environment: Problem Description: The following issue affects the setrlimit() and getrlimit() system calls on Linux 2.6.13 (and earlier) on x86. (And probably it occurs on other 32-bit Linux platforms as well.) Internally, resource limits are represented in the 'rlimit' structure (defined in include/linux/resource.h) as unsigned longs, meaning 32 bits on x86. However, this data type is not wide enough. The most pertinent limit here is RLIMIT_FSIZE, which specifies the maximum size to which a file can grow: to be useful, this limit must be represented using a type that is as wide as the type used to represent file offsets, i.e., as wide as a 64-bit off_t. Current versions of glibc (e.g., 2.3.5) deal with this situation somewhat strangely: if a program compiled with _FILE_OFFSET_BITS set to 64 (i.e., off_t is thus 'long long' -- 64 bits) tries to set a resource limit to a value larger than can be represented in a 32-bit unsigned long, then the glibc wrapper for setrlimit() silently converts the limit value to RLIM_INFINITY. (The rlimit_large.c program below can be used to demonstrate this behaviour.) In other words, the requested resource limit setting is silently ignored. (One could argue that perhaps the glibc wrapper should give an error, rather than silently turning a very large limit into infinity; however, the glibc developers instead seem to have decided on the current behaviour as a means of dealing with what is fundamentally a kernel problem.) (NOTE: This problem is not merely a theoretical one facing programmers developing new applications. Since many x86 distributions compile all (file) utilities with -D_FILE_OFFSET_BITS=64, this issue can bite end-users as well, if they expect to be able to set resource limits greater than 2^32-1.) I guess that the solution to this problem would require new setrlimit64() and getrlimit64() system calls on x86, and the existing 32-bit system calls would need to be retained so that existing binaries would still run. The one open question is what a 32-bit getrlimit() should then do when a limit greater than 4GB has been set (i.e., the program may have inherited this limit after being execed by a program that was compiled with -D_FILE_OFFSET_BITS=64; the SUSv3 specification of exec() says: "The saved resource limits in the new process image are set to be a copy of the process' corresponding hard and soft limits."). In this case, the kernel should implement the RLIM_SAVED_CUR and RLIM_SAVED_MAX feature (which will mean changing the values of those constants, since they are currently defined in glibc to have the same value as RLIM_INFINITY). SUSv3 describes the use of these constants as follows: When using the getrlimit() function, if a resource limit can be represented correctly in an object of type rlim_t, then its representation is returned; otherwise, if the value of the resource limit is equal to that of the corresponding saved hard limit, the value returned shall be RLIM_SAVED_MAX; otherwise, the value returned shall be RLIM_SAVED_CUR. When using the setrlimit() function, if the requested new limit is RLIM_INFINITY, the new limit shall be "no limit''; otherwise, if the requested new limit is RLIM_SAVED_MAX, the new limit shall be the corresponding saved hard limit; otherwise, if the requested new limit is RLIM_SAVED_CUR, the new limit shall be the corresponding saved soft limit; otherwise, the new limit shall be the requested value. In addition, if the corresponding saved limit can be represented correctly in an object of type rlim_t then it shall be overwritten with the new limit. The result of setting a limit to RLIM_SAVED_MAX or RLIM_SAVED_CUR is unspecified unless a previous call to getrlimit() returned that value as the soft or hard limit for the corresponding resource limit. Cheers, Michael PS The following rationale from the Large File Summit, is useful background: A.2.1.1.13 getrlimit() and setrlimit() These functions map limits that they cannot represent correctly to and from RLIM_SAVED_MAX and RLIM_SAVED_CUR. These values do not require any special handling by programs. They may be thought of as tokens that the kernel hands out to programs that can't handle the real answer, and that remind the kernel, when the tokens come back from the user, of what value is really meant. If setrlimit() fails for any reason (for example, EPERM), the resource limits and saved resource limits remain unchanged. This proposal does not specify any particular value for RLIM_INFINITY, RLIM_SAVED_MAX or RLIM_SAVED_CUR. Typical current implementations use the value 0x7FFFFFFF for RLIM_INFINITY, and it is recommended that RLIM_SAVED_MAX and RLIM_SAVED_CUR have similar large values. Few, if any, programs will need to refer explicitly to RLIM_SAVED_MAX or RLIM_SAVED_CUR. Those that do should not use them in C-language switch cases since they may have the same value in some implementations (see 2.2.2.3 <sys/resource.h>). A limit that can be represented correctly in an object of type rlim_t is either "no limit", which is represented with RLIM_INFINITY, or has a value not equal to any of RLIM_INFINITY or RLIM_SAVED_MAX or RLIM_SAVED_CUR and which can be represented correctly in an object of type rlim_t and which meets any additional implementation-specific criteria for correct representation. A rejected alternative proposal was to map limits that could not be represented to and from RLIM_INFINITY. This would avoid the need for the new symbols RLIM_SAVED_MAX and RLIM_SAVED_CUR. But such mapping would arguably be a lie, and the resulting information loss would cause unintuitive program behavior, especially in programs running with appropriate privileges needed to raise hard limits. A rejected alternative proposal was that if getrlimit() could not correctly return a current limit then it should instead return -1 and set errno to EOVERFLOW. But that would result in unnecessary breakage of programs. (Note that this breakage occurs even when no large files are present.) It would also result in malfunction of programs that assume that they are calling getrlimit() properly and so failure "cannot happen". For example, in the 4.4 BSD-Lite distribution, there are at least 15 unchecked calls to getrlimit(). When the 4.4 BSD csh limit function is used to report the current limits, there is no check of the return code and so the reported results can be entirely incorrect. Also, non-superuser programs typically unlimit themselves with: getrlimit(RLIMIT_STACK, &rl); rl.rlim_cur = rl.rlim_max; setrlimit(RLIMIT_STACK, &rl); If the getrlimit() fails then garbage is passed to setrlimit() which may result in an unwanted and extremely restricted limit. Several utilities that are part of the GNU C compiler have this problem. /* rlimit_large.c */ #define _FILE_OFFSET_BITS 64 #include <sys/resource.h> #include <sys/types.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <errno.h> #define errMsg(msg) { perror(msg); } #define errExit(msg) { perror(msg); exit(EXIT_FAILURE); } static void printRlimit(const char *msg, int resource) { struct rlimit rlim; if (getrlimit(resource, &rlim) == -1) errExit("getrlimit"); printf("%s soft=", msg); if (rlim.rlim_cur == RLIM_INFINITY) printf("infinite"); else if (rlim.rlim_cur == RLIM_SAVED_CUR) printf("unrepresentable"); else printf("%lld", (long long) rlim.rlim_cur); printf("; hard="); if (rlim.rlim_max == RLIM_INFINITY) printf("infinite\n"); else if (rlim.rlim_max == RLIM_SAVED_MAX) printf("unrepresentable"); else printf("%lld\n", (long long) rlim.rlim_max); } /* printRlimit */ int main(int argc, char *argv[]) { struct rlimit rl; /* Show the value of the glibc constant, just for interest */ printf("RLIM_INFINITY=%llx\n", (unsigned long long) RLIM_INFINITY); if (getrlimit(RLIMIT_FSIZE, &rl) == -1) errExit("setrlimit"); printRlimit("Initial RLIMIT_FSIZE limits : ", RLIMIT_FSIZE); #define VAL1 10000 rl.rlim_cur = VAL1; printf("About to set rlim_cur to %lld\n", (long long) rl.rlim_cur); if (setrlimit(RLIMIT_FSIZE, &rl) == -1) errExit("setrlimit"); if (getrlimit(RLIMIT_FSIZE, &rl) == -1) errExit("setrlimit"); printRlimit("RLIMIT_FSIZE limits after setrlimit(): ", RLIMIT_FSIZE); #define VAL2 4999222333LL rl.rlim_cur = VAL2; printf("About to set rlim_cur to %lld\n", (long long) rl.rlim_cur); if (setrlimit(RLIMIT_FSIZE, &rl) == -1) errExit("setrlimit"); if (getrlimit(RLIMIT_FSIZE, &rl) == -1) errExit("setrlimit"); printRlimit("RLIMIT_FSIZE limits after setrlimit(): ", RLIMIT_FSIZE); exit(EXIT_SUCCESS); } /* main */
Michael, thanks for the comprehensive and clear report. I agree, the 32/64 syscall is the right way forward. The issue of what to do with 32bit version when value is > ulong has a fundamental ABI issue. If we add RLIM_SAVED_MAX and RLIM_SAVED_CUR, we have to reserve those (obvious choice would be 0xfffffffe), which wasn't reserved before, so there's still room for userspace confusion. Albeit, this is an unlikely used value, it is an ABI change. Also, how do we rationalize the difference between 32/64 bit RLIM_INFINITY? Hmm, I'll pick away at a patch to do this and see where it goes. It's certainly not pressing priority, so we don't need to worry for 2.6.13.
* Michael Kerrisk (michael.kerrisk@gmx.net) wrote: > Thanks. I must confess to having a bit of help from Geoff > Clare of The Open Group in straightening out the details. Ah, nice ;-) > > I agree, the 32/64 syscall is the right way forward. > > The issue of what to do with 32bit version when value is > > ulong has a fundamental ABI issue. If we add RLIM_SAVED_MAX > > and RLIM_SAVED_CUR, we have to reserve those (obvious choice > > would be 0xfffffffe), which wasn't reserved before, so there's > > still room for userspace confusion. Albeit, this is an > > unlikely used value, it is an ABI change. > > Yes, I hadn't thought about that. In case it's of interest, > here are how RLIM_SAVED_CUR, RLIM_SAVED_MAX, and RLIM_INFINITY > seem to be defined (for 32-bits) on a few systems: Thanks. > RLIM_SAVED_CUR RLIM_SAVED_MAX RLIM_INFINITY > Irix 6.5 0x7ffffffd 0x7ffffffe 0x7fffffff > AIX 5.1 (RLIM_ININITY-2) (RLIM_ININITY-1) 0x7FFFFFFF > Solaris 8 0x7ffffffd 0x7ffffffe 0x7fffffff Yeah, I figured we'd just overlap RLIM_SAVED_CUR and RLIM_SAVED_MAX. Since they have the same meaning for different fields in a struct. I suppose if someone wanted to lower their hard limit to the soft limit it would be useful to be able to distinguish. I had just hoped to rob as few currently legal values as possible. (Of course, not to mention that we have RLIM_INFINITY as 0xffffffff and 0x7fffffff depending on hardware platform, but that's not a real issue).
> http://bugzilla.kernel.org/show_bug.cgi?id=5042 > > > I agree, the 32/64 syscall is the right way forward. > > > The issue of what to do with 32bit version when value is > > > ulong has a fundamental ABI issue. If we add RLIM_SAVED_MAX > > > and RLIM_SAVED_CUR, we have to reserve those (obvious choice > > > would be 0xfffffffe), which wasn't reserved before, so there's > > > still room for userspace confusion. Albeit, this is an > > > unlikely used value, it is an ABI change. > > > > Yes, I hadn't thought about that. In case it's of interest, > > here are how RLIM_SAVED_CUR, RLIM_SAVED_MAX, and RLIM_INFINITY > > seem to be defined (for 32-bits) on a few systems: > > Thanks. > > > RLIM_SAVED_CUR RLIM_SAVED_MAX RLIM_INFINITY > > Irix 6.5 0x7ffffffd 0x7ffffffe 0x7fffffff > > AIX 5.1 (RLIM_ININITY-2) (RLIM_ININITY-1) 0x7FFFFFFF > > Solaris 8 0x7ffffffd 0x7ffffffe 0x7fffffff > > Yeah, I figured we'd just overlap RLIM_SAVED_CUR and RLIM_SAVED_MAX. > Since they have the same meaning for different fields in a struct. > I suppose if someone wanted to lower their hard limit to the soft limit it > would be useful to be able to distinguish. Yes. If (and SUSv3 says only if) either of these two constant values is returned for a particular resource limit by getrlimit(), then they can also be used in a subsequent setrlimit() call to change the settings of that limit: if either rlim_cur or rlim_max is set to RLIM_SAVED_CUR, then the resource limit is set to the soft limit value that was in effect before the call; if either rlim_cur or rlim_max is set to RLIM_SAVED_MAX, then the resource limit is set to the hard limit value that was in effect before the call. So, I think the two values must be distinct. Cheers, Michael
This is a bit ugly. I suppose we could fix it by adding a new rlimit which specifies RLIMIT_FSIZE in units of getpagesize(). Or add a new syscall just to set RLIMIT_FSIZE. All rather unpleasant. Ulrich, any thoughts? It doesn't seem terribly important?
glibc is waiting for the longest time for a [gs]etrlimit64 implementation in the kernel. The LFS extensions define such interfaces and we implement them. But obviously it is done using the old 32-bit interfaces and therefore this limit. struct rlimit64 is simply a type with 64-bit members, otherwise it's the same as struct rlimit. I suggest implementing these syscalls. Of course we could also say "who cares about 32-bit these days" and continue to ignore the problem.
I have implemented the two syscalls and herewith I am attaching the patch for 2.6.24.3 and a test program (fsizerlim64.c) to get the limits. Issues Facing: * Though the limits are initialized to RLIM64_INFINITY, garbage values are set to them. -The outof the test program is: narendra@infinity:~$ gcc fsizerlim64.c narendra@infinity:~$ ./a.out retval : 0 rlim | max64 = f496719400000001 | cur64 = f4967194 narendra@infinity:~$ ./a.out retval : 0 rlim | max64 = fe86a13df498a628 | cur64 = f499ff98f499ff98 -Placed some printks in sys_getrlimit64, the output of dmesg is as follows: dmesg [ 111.221402] resource = 1, RLIM64_INFINITY = ffffffffffffffff, RLIMIT_FSIZE = 1, RLIM64_NLIMITS = 2 [ 111.221411] current rlim64 : max64 = f4967194, cur64 = f496719400000001 [ 111.221416] value (local var, before) : max64 = c02f9730b7f4cce0, cur64 = b7f18ff4f4ae5e00 [ 111.221421] value (after assignment) : max64 = f4967194, cur64 = f496719400000001 [ 118.437395] resource = 1, RLIM64_INFINITY = ffffffffffffffff, RLIMIT_FSIZE = 1, RLIM64_NLIMITS = 2 [ 118.437406] current rlim64 : max64 = f499ff98f499ff98, cur64 = fe86a13df498a628 [ 118.437411] value (local var, before) : max64 = c02f9730b7f94ce0, cur64 = b7f60ff4f4b41e00 [ 118.437419] value (after assignment) : max64 = f499ff98f499ff98, cur64 = fe86a13df498a628
Created attachment 15346 [details] patch 2.6.24.3 rlimit64
Created attachment 15347 [details] fsizerlimit64.c
Can anybody provide indication to proceed further.
Please send the patch to linux-kernel@vger.kernel.org linux-fsdevel@vger.kernel.org and myself as per Documentation/SubmittingPatches, http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt, etc. Thanks.
Hi Andrew, As per documentation, submitted the patch to the above mailing lists on 23rd & 31st of Mar-2008. May I know when it can be merged to main-line of kernel tree. Thanks.
I looked over the patch briefly and I had rather a lot of issues with it. It will take me some time to comment on them in detail. I'll try to get onto that. Feel free to remind me if I don't.
(In reply to comment #12) > I looked over the patch briefly and I had rather a lot of issues with it. > It will take me some time to comment on them in detail. I'll try to get > onto that. Feel free to remind me if I don't. Hi Andrew, This is your reminder ;-). Narendra, if you do another iteration of this patch, please CC me when writing to the lists. Cheers, Michael
I've again created a patch for kernel v2.6.26 and tested it but running into the issues mentioned in Comment #6. I would appriciate if any body help me out to resolve the issue. Herewith I am attaching the patch and the program whereby we can retrieve the limit values stored in the kernel. Overview of the Patch ===================== The patch addes two syscalls such as sys_getrlimit64() sys_setrlimit64() 1. Header Files include/linux/resource.h Defined 'struct rlimit64' to accommodate limit max bounderies include/linux/sched.h Declared struct rlimit64 rlim64[RLIM64_NLIMITS]; in task_struct to have the limits for each process include/linux/init_task.h Initialized .rlim64 with INIT_RLIMITS64 ( is defined in include/asm-generic/resource.h) include/asm-x86/unistd_32.h Defined syscall numbers for sys_setrlimit64(327), sys_getrlimit64(328) 2. Source Files (.c) kernel/sys.c added sys_getrlimit64, sys_setrlimit64 functions.
Created attachment 17025 [details] patch-2.6.26-rlim64 patch-2.6.26-rlim64
Created attachment 17026 [details] fsizerlim64.c for patch-2.6.26-rlim64 fsizerlim64.c for patch-2.6.26-rlim64
(In reply to comment #14) > I've again created a patch for kernel v2.6.26 and tested it but running into > the issues mentioned in Comment #6. Issue with patch is that when process being created, 'rlim64' object (of signal object in task_struct) is also to be assigned appropriate values inaddition to 'rlim' This is done by modifying kernel/fork.c and kernel/exit.c > > I would appriciate if any body help me out to resolve the issue. > > Herewith I am attaching the patch and the program whereby we can retrieve the > limit values stored in the kernel. > > Overview of the Patch > ===================== > The patch addes two syscalls such as sys_getrlimit64() sys_setrlimit64() > 1. Header Files > include/linux/resource.h > Defined 'struct rlimit64' to accommodate limit max bounderies > include/linux/sched.h > Declared struct rlimit64 rlim64[RLIM64_NLIMITS]; in task_struct to have > the limits for each process > include/linux/init_task.h > Initialized .rlim64 with INIT_RLIMITS64 > ( is defined in include/asm-generic/resource.h) > include/asm-x86/unistd_32.h > Defined syscall numbers for sys_setrlimit64(327), sys_getrlimit64(328) > 2. Source Files (.c) > kernel/sys.c > added sys_getrlimit64, sys_setrlimit64 functions. > Now that the modified patch is working as expected and the test results are observed as follows when I ran fsizerlim64.c: getrlimit64: Limits in the Kernel .... retval : 0 rlim | max64 = ffffffffffffffff rlim | cur64 = ffffffffffffffff setrlimit64: setting the following limits ... retval : 0 rlim | max64 = 1122334455667788 rlim | cur64 = 1122334455667788 getrlimit64: Limits in the Kernel set .... retval : 0 rlim | max64 = 1122334455667788 rlim | cur64 = 1122334455667788 The patch and the user space program can be found in the attachment.
Created attachment 17145 [details] working patch-2.6.26-rlim64 working patch-2.6.26-rlim64
Created attachment 17146 [details] fsizerlim64.c: user space program to test working patch-2.6.26-rlim64 fsizerlim64.c: user space program to test working patch-2.6.26-rlim64
Hi Andrew, As per documentation, submitted the patch to the mailing lists on 08-08-2008 & 19-Aug-2008. May I know when it can be merged to main-line of kernel tree. Thanks, Narendra.
Grabbing this bug to progress it
Hi Andrew, Attaching the patch, patch-2.6.29-rc2-rlim64, for latest pre-patched kernel, linux-2.6.29-rc2. Can you consider this patch for merging. Thanks, Narendra.
Created attachment 19861 [details] patch-2.6.29-rc2-rlim64 patch-2.6.29-rc2-rlim64
Created attachment 19862 [details] fsizerlim64.c: User space program to test the patch patch-2.6.29-rc2-rlim64 fsizerlim64.c: User space program to test the patch patch-2.6.29-rc2-rlim64
Hi Folks, I have sent patch to linux-kernel and linux-fsdel but the patch is not available in the mailing lists archives. Can anybody help me in sending the patch to the above mailing lists and also let me know what could be the problem. Note that my mail-id has been subscribed to the lists. Thanks in advance. Thanks, Narendr.
Any progress here? What is the patch supposed to do? It only stores the rlimit64 without any use of it except returning values from that back via getrlimit64. The rest of resource handling in the kernel still uses 32-bit rlimits.
(In reply to comment #26) > Any progress here? > > What is the patch supposed to do? It only stores the rlimit64 without any use > of it except returning values from that back via getrlimit64. The rest of > resource handling in the kernel still uses 32-bit rlimits. Started working on this as at last I've got x84_64 platform. Here are the tasks involved to completed the task: 1. Insert a check to send a signal SIGXFSZ if user exceeds file limit rlim64_cur. --> DONE 2. Add compat system calls for setrlimit64/getrlimit64. --> IN PROGRESS
The 2.6.36 kernel adds prlimit(), which does not suffer the problem noted in this bug. The new system call could be used to fix this problem within the glibc wrappers. I've reported this against glibc: http://sources.redhat.com/bugzilla/show_bug.cgi?id=12201
Just for the record on why this bug was (rightly) marked obsolete. As a consequence of http://sources.redhat.com/bugzilla/show_bug.cgi?id=12201, glibc's getrlimit/setrlimit wrappers have been modified to use prlimit(2), which does not suffer this issue.