Bug 219912 - in a container psx_test hangs
Summary: in a container psx_test hangs
Status: NEW
Alias: None
Product: Tools
Classification: Unclassified
Component: libcap (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Tools/Libcap default virtual assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-03-22 19:39 UTC by Andrew G. Morgan
Modified: 2025-03-26 05:10 UTC (History)
0 users

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Andrew G. Morgan 2025-03-22 19:39:48 UTC
This bug was noted while resolving a different bug: https://bugzilla.kernel.org/show_bug.cgi?id=219880 .

The issue seems to be due to the PSX mechanism failing to work in this environment.

root@7964c4e70be8:/sources/libcap2-2.75/tests# make test
/usr/bin/make run_psx_test run_libcap_psx_test
make[1]: Entering directory '/sources/libcap2-2.75/tests'
gcc -O2  -Wall -Wwrite-strings -Wpointer-arith -Wcast-qual -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Winline -Wshadow -Wunreachable-code -Dlinux -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I/sources/libcap2-2.75/tests/../libcap/include/uapi -I/sources/libcap2-2.75/tests/../libcap/include  psx_test.c -o psx_test -Wl,-rpath,../libcap -L/sources/libcap2-2.75/tests/../libcap -Wl,--no-as-needed -Wl,--whole-archive -lpsx -Wl,--no-whole-archive -Wl,--as-needed -lpthread
./psx_test
iteration [1244]: 0
xxx
^Cmake[1]: *** [Makefile:72: run_psx_test] Interrupt
make: *** [Makefile:57: test] Interrupt

root@7964c4e70be8:/sources/libcap2-2.75/tests# ldd psx_test
        libpsx.so.2 => ../libcap/libpsx.so.2 (0x00007fea0a3e0000)
        libc.so.6 => /lib/mips64el-linux-gnuabi64/libc.so.6 (0x00007fea0a1b0000)
        /lib64/ld.so.1 (0x00007fea0b8b7000)

I placed a printf("xxx\n") and printf("yyy\n") before and after the psx_syscall function call

        pthread_mutex_unlock(&mu);
printf("xxx\n");
        psx_syscall(SYS_prctl, PR_SET_KEEPCAPS, global_kept);
printf("yyy\n");
        pthread_mutex_lock(&mu);

As can be seen above, the code appears to hang somewhere inside the psx_syscall() call.
Comment 1 Andrew G. Morgan 2025-03-22 19:42:24 UTC
This does not fail on a system image with QEMU for the mips64el build. Just when run in a container.
Comment 2 Andrew G. Morgan 2025-03-22 20:52:51 UTC
On the surface it looks as if the threads are not working with consistent signal blocking.

The first of these is the thread that is performing the psx_syscall() and the 2nd thread here is one of the other threads that should be receiving signal 33. What is odd, is that it is actually receiving (and blocking) a different signal. That is:

(gdb) p/x 1ULL <<(33-1)
$4 = 0x100000000

but the pending signal is 0x800000000 

root@7964c4e70be8:/sources/libcap2-2.75/tests# grep Sig /proc/5713/status
SigQ:   3/255093
SigPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000400000000
SigCgt: 0000000b000004dc
root@7964c4e70be8:/sources/libcap2-2.75/tests# grep Sig /proc/5715/status
SigQ:   3/255093
SigPnd: 0000000800000000
SigBlk: fffffffe7ffbfa77
SigIgn: 0000000400000000
SigCgt: 0000000b000004dc
Comment 3 Andrew G. Morgan 2025-03-22 21:11:02 UTC
Curious. If I change the line in psx.c that defines the signal number as 33 to 30

root@7964c4e70be8:/sources/libcap2-2.75/tests# grep Sig /proc/7012/status
SigQ:   3/255093
SigPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000c00000000
SigCgt: 00000003008004dc
root@7964c4e70be8:/sources/libcap2-2.75/tests# grep Sig /proc/7014/status
SigQ:   3/255093
SigPnd: 0000000000800000
SigBlk: fffffffe7ffbfa77
SigIgn: 0000000c00000000
SigCgt: 00000003008004dc

and changing it to 31 I get:

root@7964c4e70be8:/sources/libcap2-2.75/tests# grep Sig /proc/7376/status
SigQ:   3/255093
SigPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000c00000000
SigCgt: 00000003010004dc
root@7964c4e70be8:/sources/libcap2-2.75/tests# grep Sig /proc/7378/status
SigQ:   3/255093
SigPnd: 0000000001000000
SigBlk: fffffffe7ffbfa77
SigIgn: 0000000c00000000
SigCgt: 00000003010004dc

Changing it to 32, the mask jumps noticeably:

root@7964c4e70be8:/sources/libcap2-2.75/tests# grep Sig /proc/7740/status
SigQ:   6/255093
SigPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000800000000
SigCgt: 00000007000004dc
root@7964c4e70be8:/sources/libcap2-2.75/tests# grep Sig /proc/7742/status
SigQ:   6/255093
SigPnd: 0000000400000000
SigBlk: fffffffe7ffbfa77
SigIgn: 0000000800000000
SigCgt: 00000007000004dc

In all the cases, the chosen signal is blocked by the thread's SigBlk mask.
Comment 4 Andrew G. Morgan 2025-03-26 05:10:20 UTC
Christian observes that "the problem does not manifest when host and guest
architecture match." which suggests that there is something odd with QEMU acting as the target architecture vs. "native".

Note You need to log in before you can comment on or make changes to this bug.