setcap in a unpriviledged lxc container does work, but verification with `-v` does not: ``` # cp /bin/ping /tmp # su - x -c "/tmp/ping google.com" /tmp/ping: socket: Address family not supported by protocol # setcap cap_net_raw+ep /tmp/ping # su - x -c "/tmp/ping google.com" PING google.com (142.250.74.206) 56(84) bytes of data. # setcap -v cap_net_raw+ep /tmp/ping nsowner[got=1000000, want=0],/tmp/ping differs in [] ``` adding `-n 1000000` makes the verification pass, but I guess this should be transparent.
What version of libcap are you using? When I build libcap-2.44 sources as follows $ make DYNAMIC=no clean all under a Linux container on my Chromebook, I find: $ cd progs $ cat /proc/self/uid_map 0 1000000 1000 1000 1000 1 1001 1001 1 1002 1001002 654358 655360 655360 1 655361 1655361 9996 665357 665357 1 665358 1665358 999334642 $ sudo ./setcap cap_setfcap=i ./setcap $ ./getcap ./setcap ./setcap cap_setfcap=i $ ./setcap -v cap_setfcap=i ./setcap ./setcap: OK
I tried 2.44 and 2.43 with same result. Here are you exact same commands with 2.44: ``` $ make DYNAMIC=no clean all $ cd progs $ cat /proc/self/uid_map 0 1000000 65536 $ sudo ./setcap cap_setfcap=i ./setcap $ ./getcap ./setcap ./setcap cap_setfcap=i $ ./setcap -v cap_setfcap=i ./setcap nsowner[got=1000000, want=0],./setcap differs in [] ``` Are there differences in our kernel config that may explain this ? Or container config ? `lxc config show` says this : ``` volatile.idmap.base: "0" volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]' volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]' volatile.last_state.idmap: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":65536},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":65536}]' ```
It could well be some magic on a Chromebook, not sure where to start looking. From the progs directory, I get these mount options for my container: $ mount|fgrep " $(df .|fgrep /|awk '{print $6}') " /dev/vdb on / type btrfs (rw,relatime,discard,space_cache,user_subvol_rm_allowed,subvolid=266,subvol=/lxd/storage-pools/default/containers/penguin/rootfs)
Just to confirm, is there anything interesting about your kernel? You say you are running 5.4.66. Is there anything else interesting to say about it?
The kernel should be returning the converted capabilities to libcap, whereas this looks like libcap is getting the capabilities as they look from the initial user namespace. The contents of /proc/self/status and /proc/self/uid_map would be interesting, as well as your kernel .config. Also if you could compile the following program and run it against that /tmp/ping both from a host process and a container process, that would be helpful. On my system, from the host it tells me it found a v3 capability targeted at uid 100000, while from the container it finds a v2 capability. #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <sys/types.h> #include <sys/xattr.h> #include <errno.h> typedef __uint32_t u32; #define VFS_CAP_U32_3 2 #define VFS_CAP_U32 VFS_CAP_U32_3 struct vfs_ns_cap_data { u32 magic_etc; struct { u32 permitted; /* Little endian */ u32 inheritable; /* Little endian */ } data[VFS_CAP_U32]; u32 rootid; }; #define VFS_CAP_FLAGS_EFFECTIVE 0x000001 #define VFS_CAP_REVISION_2 0x02000000 #define VFS_CAP_REVISION_3 0x03000000 int sans_flags(int x) { return x & ~VFS_CAP_FLAGS_EFFECTIVE; } int main(int argc, char **argv) { ssize_t sz; char v[200]; struct vfs_ns_cap_data *nscap; sz = getxattr(argv[1], "security.capability", v, 200); if (sz < 0) { printf("Failed reading xattr: %d\n", errno); exit(1); } if (sz > 200) { printf("xattr too long"); exit(1); } nscap = (struct vfs_ns_cap_data *)v; switch (sans_flags(nscap->magic_etc)) { case VFS_CAP_REVISION_3: printf("v3, rootid is %d\n", nscap->rootid); break; case VFS_CAP_REVISION_2: printf("v2\n"); break; default: printf("unknown version %x\n", nscap->magic_etc); break; } if (sz != sizeof(struct vfs_ns_cap_data)) { printf("sz was %ld should be %ld\n", sz, sizeof(struct vfs_ns_cap_data)); exit(1); } }
(In reply to Andrew G. Morgan from comment #3) > > $ mount|fgrep " $(df .|fgrep /|awk '{print $6}') " > /dev/vdb on / type btrfs > (rw,relatime,discard,space_cache,user_subvol_rm_allowed,subvolid=266,subvol=/ > lxd/storage-pools/default/containers/penguin/rootfs) ``` # mount|fgrep " $(df .|fgrep /|awk '{print $6}') " pool/containers/esel on / type zfs (rw,xattr,posixacl) ``` (In reply to Andrew G. Morgan from comment #4) > Just to confirm, is there anything interesting about your kernel? You say > you are running 5.4.66. Is there anything else interesting to say about it? It's a custom-configured kernel. One thing that is maybe not too common is that I built zsf in, without module support. (In reply to Serge Hallyn from comment #5) > The contents of /proc/self/status and /proc/self/uid_map would be > interesting, as well as your kernel .config. I attached my `.config` to this report. `/proc/self/uid` is the one reported above, just 1 line: 0 1000000 65536. ``` # cat /proc/self/status Name: cat Umask: 0022 State: R (running) Tgid: 19117 Ngid: 0 Pid: 19117 PPid: 19107 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 256 Groups: 0 1 2 3 4 6 10 11 26 27 NStgid: 19117 NSpid: 19117 NSpgid: 19117 NSsid: 19106 VmPeak: 6864 kB VmSize: 6864 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 444 kB VmRSS: 444 kB RssAnon: 64 kB RssFile: 380 kB RssShmem: 0 kB VmData: 312 kB VmStk: 132 kB VmExe: 24 kB VmLib: 1416 kB VmPTE: 48 kB VmSwap: 0 kB CoreDumping: 0 THP_enabled: 1 Threads: 1 SigQ: 0/63688 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000000 SigCgt: 0000000000000000 CapInh: 0000000000000000 CapPrm: 0000003fffffffff CapEff: 0000003fffffffff CapBnd: 0000003fffffffff CapAmb: 0000000000000000 NoNewPrivs: 0 Seccomp: 2 Speculation_Store_Bypass: thread force mitigated Cpus_allowed: f Cpus_allowed_list: 0-3 Mems_allowed: 1 Mems_allowed_list: 0 voluntary_ctxt_switches: 0 nonvoluntary_ctxt_switches: 0 ``` > Also if you could compile the following program and run it against that /tmp/ping both from a host process and a container process, that would be helpful. On my system, from the host it tells me it found a v3 capability targeted at uid 100000, while from the container it finds a v2 capability. The output is the same for me from both container and host: v3, rootid is 1000000.
(In reply to Hervé Guillemet from comment #6) > > (In reply to Andrew G. Morgan from comment #4) > > Just to confirm, is there anything interesting about your kernel? You say > > you are running 5.4.66. Is there anything else interesting to say about it? > > It's a custom-configured kernel. One thing that is maybe not too common is > that I built zsf in, without module support. It also has the gentoo patches.
Created attachment 293009 [details] Kernel .config
Hi - do you have a publically accessible repo with the branch that has all the patches you used? I've seen some wrong attempts at glueing together patchsets at least for various phone specific kernels in the past.
You mean the Gentoo kernel patches ? You can find them there: https://dev.gentoo.org/~mpagano/genpatches
Don't waste time reviewing the Gentoo patches: I just reboot with a vanilla 5.4.66, without the Gentoo patches, and the bug is still here.
Tried to reproduce this on 5.8.0-21-generic #22-Ubuntu using a zfs filesystem, but still can't: ubuntu@brauner:/newpool$ sudo setcap -v cap_net_raw+ep ./capsh ./capsh: OK
I now run on 5.8.16 and I still have: ``` # setcap cap_net_raw+ep ./ping # setcap -v cap_net_raw+ep ./ping nsowner[got=1000000, want=0],./ping differs in [] ```
I spun up a gentoo linode: Linux localhost 5.4.48-gentoo-x86_64 #1 SMP Mon Aug 10 21:25:02 UTC 2020 x86_64 AMD EPYC 7542 32-Core Processor AuthenticAMD GNU/Linux Was not able to reproduce, still. To keep things simpler, can you please 'emerge lxc', then as unpriv user run lxc-usernexec. As root on the host, cp /sbin/capsh /tmp/ setcap -n 100000 cap_sys_admin+pe caplook /tmp/capsh where 'caplook" is the program from comment #6. Then, as the unrpvileged user under lxc-usernsexec also do caplook /tmp/capsh Do you get the same results? You should have a v2 cap as the unpriv user, and a v3 cap as root.
I get `v3, rootid is 100000` with both host root and under lxc-usernexec
`CONFIG_SECURITY` was not set. When set, I see v2 cap in containers. That's far from obvious from the Kconfig help text than this setting is necessary, and even that is does something by itself. Maybe some dependencies is lacking. `caplook` now also says `sz was 20 should be 24`. Does this mean something else is wrong ?
> --- Comment #16 from Hervé Guillemet (herve@guillemet.org) --- > `CONFIG_SECURITY` was not set. When set, I see v2 cap in containers. Thanks! > That's far from obvious from the Kconfig help text than this setting is > necessary, and even that is does something by itself. Maybe some dependencies > is lacking. Ok - I think this may be a side effect of how the LSM stacking stuff is working. (Cc:d Casey). Casey, 'git blame' tells me that the #ifdef CONFIG_SECURITY at bottom of security/commoncap.c came from commit b1d9e6b0646d0e5ee5d9050bd236b6c65d66faef Author: Casey Schaufler <casey@schaufler-ca.com> Date: Sat May 2 15:11:42 2015 -0700 LSM: Switch to lists of hooks But commoncap should not require CONFIG_SECURITY, historically. > `caplook` now also says `sz was 20 should be 24`. Does this mean something > else > is wrong ? I saw that too, I think it's a bug in my little test program :)
On 10/27/2020 6:09 AM, Serge E. Hallyn wrote: >> --- Comment #16 from Hervé Guillemet (herve@guillemet.org) --- >> `CONFIG_SECURITY` was not set. When set, I see v2 cap in containers. > Thanks! > >> That's far from obvious from the Kconfig help text than this setting is >> necessary, and even that is does something by itself. Maybe some >> dependencies >> is lacking. > Ok - I think this may be a side effect of how the LSM stacking stuff is > working. (Cc:d Casey). Casey, 'git blame' tells me that the #ifdef > CONFIG_SECURITY at bottom of security/commoncap.c came from > > commit b1d9e6b0646d0e5ee5d9050bd236b6c65d66faef > Author: Casey Schaufler <casey@schaufler-ca.com> > Date: Sat May 2 15:11:42 2015 -0700 > > LSM: Switch to lists of hooks > > But commoncap should not require CONFIG_SECURITY, historically. There's always been a difference between calling security_capable() and cap_capable() when CONFIG_SECURITY is unset. The #ifdef is required because the data structures being used to register security modules aren't defined when LSMs aren't configured. > >> `caplook` now also says `sz was 20 should be 24`. Does this mean something >> else >> is wrong ? > I saw that too, I think it's a bug in my little test program :)
On Tue, Oct 27, 2020 at 08:52:55AM -0700, Casey Schaufler wrote: > On 10/27/2020 6:09 AM, Serge E. Hallyn wrote: > >> --- Comment #16 from Hervé Guillemet (herve@guillemet.org) --- > >> `CONFIG_SECURITY` was not set. When set, I see v2 cap in containers. > > Thanks! > > > >> That's far from obvious from the Kconfig help text than this setting is > >> necessary, and even that is does something by itself. Maybe some > dependencies > >> is lacking. > > Ok - I think this may be a side effect of how the LSM stacking stuff is > > working. (Cc:d Casey). Casey, 'git blame' tells me that the #ifdef > > CONFIG_SECURITY at bottom of security/commoncap.c came from > > > > commit b1d9e6b0646d0e5ee5d9050bd236b6c65d66faef > > Author: Casey Schaufler <casey@schaufler-ca.com> > > Date: Sat May 2 15:11:42 2015 -0700 > > > > LSM: Switch to lists of hooks > > > > But commoncap should not require CONFIG_SECURITY, historically. > > There's always been a difference between calling security_capable() > and cap_capable() when CONFIG_SECURITY is unset. The #ifdef is required > because the data structures being used to register security modules > aren't defined when LSMs aren't configured. That's understandable, but it's still a regression (even if it came in years ago), and not matched by change in documentation. As Hervé says, the security/Kconfig says about CONFIG_SECURITY: If this option is not selected, the default Linux security model will be used. So the simplest fix is probably to change that text.
So, root cause here is that the issue is one of kernel configuration? Should we change the component for this bug to have it tracked correctly?
On 10/27/2020 9:36 AM, Serge E. Hallyn wrote: > On Tue, Oct 27, 2020 at 08:52:55AM -0700, Casey Schaufler wrote: >> On 10/27/2020 6:09 AM, Serge E. Hallyn wrote: >>>> --- Comment #16 from Hervé Guillemet (herve@guillemet.org) --- >>>> `CONFIG_SECURITY` was not set. When set, I see v2 cap in containers. >>> Thanks! >>> >>>> That's far from obvious from the Kconfig help text than this setting is >>>> necessary, and even that is does something by itself. Maybe some >>>> dependencies >>>> is lacking. >>> Ok - I think this may be a side effect of how the LSM stacking stuff is >>> working. (Cc:d Casey). Casey, 'git blame' tells me that the #ifdef >>> CONFIG_SECURITY at bottom of security/commoncap.c came from >>> >>> commit b1d9e6b0646d0e5ee5d9050bd236b6c65d66faef >>> Author: Casey Schaufler <casey@schaufler-ca.com> >>> Date: Sat May 2 15:11:42 2015 -0700 >>> >>> LSM: Switch to lists of hooks >>> >>> But commoncap should not require CONFIG_SECURITY, historically. >> There's always been a difference between calling security_capable() >> and cap_capable() when CONFIG_SECURITY is unset. The #ifdef is required >> because the data structures being used to register security modules >> aren't defined when LSMs aren't configured. > That's understandable, but it's still a regression (even if it came in > years ago), and not matched by change in documentation. > > As Hervé says, the security/Kconfig says about CONFIG_SECURITY: > > If this option is not selected, the default Linux security > model will be used. > > So the simplest fix is probably to change that text. Ah, but to what? I can't say that I've completely understood the discussion.
From the end user that I am, the issue it that this feature, documented in capabilities(7): “Correspondingly, when a version 3 security.capability attribute is retrieved (getxattr(2)) by a process that resides inside a user namespace that was created by the root user ID (or a descendant of that user namespace), the returned attribute is (automatically) simplified to appear as a version 2 attribute (i.e., the returned value is the size of a version 2 attribute and does not include the root user ID). These automatic translations mean that no changes are required to user-space tools (e.g., setcap (1) and getcap (1)) in order for those tools to be used to create and retrieve version 3 security.capability attributes.” does not work if CONFIG_SECURITY is not set. If this cannot be fixed, my feeling is that if namespaced file capabilities is selected in the kernel config, CONFIG_SECURITY should be automatically selected.
I'm fine with that (enabling CONFIG_SECURITY). Some may object, but I'll float a patch soon.
On Tue, Oct 27, 2020 at 11:31:49AM -0700, Casey Schaufler wrote: > On 10/27/2020 9:36 AM, Serge E. Hallyn wrote: > > On Tue, Oct 27, 2020 at 08:52:55AM -0700, Casey Schaufler wrote: > >> On 10/27/2020 6:09 AM, Serge E. Hallyn wrote: > >>>> --- Comment #16 from Hervé Guillemet (herve@guillemet.org) --- > >>>> `CONFIG_SECURITY` was not set. When set, I see v2 cap in containers. > >>> Thanks! > >>> > >>>> That's far from obvious from the Kconfig help text than this setting is > >>>> necessary, and even that is does something by itself. Maybe some > dependencies > >>>> is lacking. > >>> Ok - I think this may be a side effect of how the LSM stacking stuff is > >>> working. (Cc:d Casey). Casey, 'git blame' tells me that the #ifdef > >>> CONFIG_SECURITY at bottom of security/commoncap.c came from > >>> > >>> commit b1d9e6b0646d0e5ee5d9050bd236b6c65d66faef > >>> Author: Casey Schaufler <casey@schaufler-ca.com> > >>> Date: Sat May 2 15:11:42 2015 -0700 > >>> > >>> LSM: Switch to lists of hooks > >>> > >>> But commoncap should not require CONFIG_SECURITY, historically. > >> There's always been a difference between calling security_capable() > >> and cap_capable() when CONFIG_SECURITY is unset. The #ifdef is required > >> because the data structures being used to register security modules > >> aren't defined when LSMs aren't configured. > > That's understandable, but it's still a regression (even if it came in > > years ago), and not matched by change in documentation. > > > > As Hervé says, the security/Kconfig says about CONFIG_SECURITY: > > > > If this option is not selected, the default Linux security > > model will be used. > > > > So the simplest fix is probably to change that text. > > Ah, but to what? I can't say that I've completely understood > the discussion. Nah there's a better fix - apologies, Casey, this was not a regression, it was broken from the start. Patch incoming.
Resolved with: https://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git/commit/?h=fixes-v5.10&id=ed9b25d1970a4787ac6a39c2091e63b127ecbfc1