Latest working kernel version: ? Earliest failing kernel version: 2.6.9-42.7 x86, 2.6.9-22.0.2.ELsmp x86_64 Distribution: RHEL-5 / Gentoo I'm running into a weird set of issues when dealing with getgrgid(3) on Linux. It appears that there was a bug with the values returned by getgrgid(3) between kernel version 2.6.9 and 2.6.23. The first issues is the fact that particular kernel versions (2.6.9 in this example) were looking up rguid's incorrectly. The second issue is the fact that contemporary kernel versions do not set an appropriate errno value when an error occurs. Not being able to find a user entry should not return SUCCESS(errno=0). The below transcript provides you with an idea of what occurred, but if I omitted anything please let me know. This set of testing was prompted by incorrect exit codes returned by id(1). ----------- [root@nova-infra-test1 ~]# id nobody; echo $? uid=99(nobody) gid=99 groups=25(eng),15045(enged),99 1 After looking at id.c for coreutils-5.21 I tied down the issue to getgrgid(3). The errno makes absolutely no sense though, and it seems to have been "fixed" between various kernel versions: [root@nova-infra-test1 ~]# ./getgr_t_wrapper.sh 42 ($i) not found in /etc/group 99 ($i) not found in /etc/group Error encountered with getgrgid(99): No such file or directory [root@nova-infra-test1 ~]# uname -a Linux nova-infra-test1 2.6.9-22.0.2.ELsmp #1 SMP Thu Jan 5 17:13:01 EST 2006 i686 i686 i386 GNU/Linux headless-horseman src # ~gcooper/getgr_t_wrapper.sh 42 ($i) not found in /etc/group Error encountered with getgrgid(42): 0, Success 99 ($i) not found in /etc/group Error encountered with getgrgid(99): 0, Success headless-horseman src # uname -a Linux headless-horseman 2.6.23.17 #3 SMP Mon Mar 24 04:34:56 PDT 2008 i686 Intel(R) Xeon(R) CPU 5140 @ 2.33GHz GenuineIntel GNU/Linux hh-internal ~ # ./getgr_t_wrapper.sh ; uname -a 42 ($i) not found in /etc/group Error encountered with getgrgid(42): 0, Success 99 ($i) not found in /etc/group Error encountered with getgrgid(99): 0, Success Linux hh-internal 2.6.24.3 #1 SMP Sat Apr 5 18:49:16 GMT 2008 x86_64 Intel(R) Xeon(R) CPU 5140 @ 2.33GHz GenuineIntel GNU/Linux "nova-infra-test1" is on an NIS domain whereas "headless-horseman" and "hh-int" are not. The former two kernel sets are patched and the latter is not.
Created attachment 15960 [details] Test C-file for getgrguid(3).
Created attachment 15961 [details] Wrapper script for C-test file. Test cases with i=42 and i=99 *should* fail (well, they did on my system ;)..) unless /etc/group has those GID entries.
The documentation (provided by debian?) available in Gentoo is also out of date and doesn't reflect this change in the code between 2.6.9 and 2.6.24.
Michael, can you look at this bug?
Garrett, I'm having a little trouble understanding your bug report. The problem is, I don't see a simple statement of what you get, and what you expect. > It appears that there was a bug with the values returned by > getgrgid(3) between kernel version 2.6.9 and 2.6.23. getgrgid(3) is not a kernel interface. It's a glibc interface. What version of glibc are you using? > The first issues is the fact that particular kernel versions > (2.6.9 in this example) were looking up rguid's incorrectly. Can you p;ease explain this. Which part of the kernel (i.e., what suystem call) is looking up rguids incorrectly? What do you mean by "incorrect"? > The second issue is the fact that contemporary kernel versions > do not set an appropriate errno value when an error occurs. Not > being able to find a user entry should not return SUCCESS(errno=0). Does the following glibc bug report have relevance here? http://sources.redhat.com/bugzilla/show_bug.cgi?id=3195 In the comments of your C program you write: * It's been proven that this test fails on some kernel versions * (2.6.9 with RHEL-5 for instance), in particular when * getgrguid(3) != getgeguid(3). But there is no such API as getgeguid().
You're right. getgeguid doesn't exist. I was thinking (r = real, e = executing). I think it's 2.3.5, but I don't have root access on the Redhat machine so I can't tell via rpm or yum. The RHEL version is RHEL-5 I think (RHEL-4, Nahant update 4), correct?
(In reply to comment #6) > You're right. getgeguid doesn't exist. I was thinking (r = real, e = > executing). > I think it's 2.3.5, What is "it"? glibc? You left many of my other questions unanswered, which makes it hard to provide further input... > but I don't have root access on the Redhat machine so I > can't tell via rpm or yum. If you are trying to determine the glibc version, then just execute the libc file, something like: $( ldd /bin/ls | grep libc.so ) > The RHEL version is RHEL-5 I think (RHEL-4, Nahant update 4), correct? I don't know what you are referring to here.
Regardless of the comments, the bug appears to be invalid from a kernel end since the issue appears to be with whatever library / access method is used to look up the user entry... the issue is no doubt from ncsd, or some other associated means. I say this because if you spot the actual "glibc" function reference, it's an extern to another library (just do "grep 'getgrgid(' /usr/include/grp.h". ldd only reveals the libc.so revision (which is .6, but many versions of libc.so in the 2.3.x ~ 2.5.x series are .6 I think).
(In reply to comment #8) > ldd only reveals the libc.so revision (which is .6, but many versions of > libc.so in the 2.3.x ~ 2.5.x series are .6 I think). Hmmm -- I missed a piece in my commands: $( ldd /bin/ls | grep libc.so | awk '{ print $3}') Note that the initial $ is *not* the shell prompt -- it's part of "$("