|Summary:||Misusing the process events connector API in one listener process can impact other listener processes|
|Product:||Process Management||Reporter:||Natan Yellin (natan)|
Description Natan Yellin 2020-04-20 15:43:53 UTC
How to reproduce: Open a netlink socket in one usermode process and send multiple control messages containing PROC_CN_MCAST_IGNORE without sending PROC_CN_MCAST_LISTEN first. Now the netlink process connector wont work for this process or *other* usermode listener process until enough PROC_CN_MCAST_LISTEN messages are sent to rebalance the negative number of listeners. Killing the original process doesn't fix the issue. The underlying issue: In drivers/connector/cn_proc.c there is a global counter, proc_event_num_listeners. Every call to PROC_CN_MCAST_IGNORE decrements this counter globally without regard to the number of times that the relevant process incremented it first. Why this is a bug: This issue lets one usermode process with relevant permissions globally mess up the netlink process connector for all processes until reboot. The issue is fixable before reboot if you can guess how many times you need to send PROC_CN_MCAST_LISTEN to rebalance things. (You can.) It doesn't seem desirable to require one usermode program to jump through hoops if another usermode program messed up the counter first. From personal experience, it also is an annoying non-obvious bug to encounter in usermode. Suggested fix: I would like to submit a patch to fix this, if appropriate. I suggest adding a per-netlink-socket boolean which will represent the current listen-state for that socket. All calls with PROC_CN_MCAST_LISTEN will be ignored on sockets already in listen state and all calls with PROC_CN_MCAST_IGNORE will be ignored on sockets not in listening state. This way the counter can never be set to a negative number of listeners. Furthermore, we will be able to cleanup the global counter when a socket is closed for listeners who forgot to call PROC_CN_MCAST_IGNORE. (In the current code those listeners are never cleaned up - the counter remains positive ruining the point of the optimisation that the counter provides.) Does this fix make sense? If so, where is the appropriate place to add such a per-netlink-socket boolean.