Bug 206207 - mq_notify loose notification request on fork()
Summary: mq_notify loose notification request on fork()
Status: NEW
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-01-14 23:43 UTC by Aleksei Mateosian
Modified: 2020-01-15 21:36 UTC (History)
1 user (show)

See Also:
Kernel Version: all
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Aleksei Mateosian 2020-01-14 23:43:30 UTC
Bug is related to implementation of POSIX mqueues, specifically mq_notify syscall.\n

The problem is: if we do subscription on notifications with help of mq_notify() syscall, and then do fork() and this fork() fail for some reason (for example there is pending signals for the parent process), then we loose the subscription on the notification.

The root cause is: during copy_process() we duplicate opened file descriptors, and then if something goes wrong inside copy_process() (for example there is pending signals for the parent process) we do cleanup for the duplicated file descriptors via exit_files() which is finally calling filp_close() and flush(). From the other side, current implementation of ipc/mqueue.c implements file_operations.flush so that it removes notification for the current process, which is certainly not designed(desired) behavior.

There is another case how to reproduce this behavior. Do fd = mq_open("my_queue"); mq_notify(fd, ...); and never do close(fd) to preserve notifications. Then if we open() and close() the queue file from the same process - we loose the notification request (subscription) for the still opened fd. I.e. current behavior is the first close() removes notification.

If we do remove_notification() in file_ops.replace instead of file_ops.flush in mqueue.c then this change fixes the described above fork() related problem. But this will change the behavior so that the last close() will remove the notifications.
Comment 1 Aleksei Mateosian 2020-01-14 23:57:20 UTC
I have mentioned file_ops.release (not file_ops.replace) in the last block of description.

Note You need to log in before you can comment on or make changes to this bug.