Linux 3.7 release candidates have a bug that makes Mozilla Firefox 16.0.2 hang for minutes on facebook.com if the flash plugin is enabled on Fedora 17 with SELinux. The commit that introduced the breakage is: commit c3c073f808b22dfae15ef8412b6f7b998644139a Author: Al Viro <viro@zeniv.linux.org.uk> Date: Tue Aug 21 22:32:06 2012 -0400 new helper: iterate_fd() iterates through the opened files in given descriptor table, calling a supplied function; we stop once non-zero is returned. Callback gets struct file *, descriptor number and const void * argument passed to iterator. It is called with files->file_lock held, so it is not allowed to block. tty_io, netprio_cgroup and selinux flush_unauthorized_files() converted to its use. I believe the change is not supposed to have any user-visible effects whatsoever. Reverting the portion of the patch that applied to security/selinux/hooks.c fixes the breakage. Debugging shows that the old (working) code in flush_unauthorized_files() would call replace_fd() on the file descriptor 2. The new code calls replace_fd() on the file descriptor 3. There is an off-by-one error somewhere. I believe match_file() in security/selinux/hooks.c should return fd instead of fd + 1. Both iterate_fd() and match_file() increment the file descriptor they return. The argument to replace_fd() is decremented by 1, which is not sufficient to compensate for two increments.
Created attachment 86131 [details] Fix for match_file(), successfully tested
I've checked other uses of iterate_fd(). drivers/tty/tty_io.c appears to have the same problem. The match function this_tty() gets an incremented file descriptor and increments it again. Fortunately, the result of iterate_fd() is only used in printk(). arch/powerpc/platforms/cell/spufs/coredump.c may have a real problem, as the the result of iterate_fd() is returned. We have the same pattern there - two increments and one decrement by 1. Callers of coredump_next_context() increment fd on their own, so it's possible that fd skips every other file descriptor. net/core/netprio_cgroup.c does some fishy things with casts. But the match function update_netprio() always returns 0 and ignores the fd argument (called "n" there), so it should not be affected. It's disappointing that a new exported function (interate_fd) is introduced with no documentation. Also, the choice of 0 as the special value is a poor choice for a function working with file descriptors. -ENOENT would be the the Linux way. -1 might be OK (the UNIX userspace way). But 0 is insane IMHO. That's what causes this mess with increments.
I'm afraid I'll not have time to implement a complete rewrite for iterate_fd() any time soon. I have no hardware to test some cases and I would have to be extremely careful with that code. I hoped Al Viro would participate, but I cannot even add him to CC (please help if you can). I think the best approach for Linux 3.7 would be to revert the patch that introduced iterate_fd(). That would restore the tested code. Further patches to introduce iterate_fd() should be tested and reviewed properly.
Viro doesn't do bugzilla so you won't be able to add him I'd suggest you email Linus including the test case and ask him to pull it and Cc viro on the email.
Created attachment 219101 [details] watch
Fixed in commit 45525b26a46cd593cb72070304c4cd7c8391bd37 (v3.7-rc1-6-g45525b26a46c), closing.