Created attachment 73503 [details] Test program See the attached C program: $ gcc -o leasesig leasesig.c $ ./leasesig foo opening foo in read-only mode calling SETSIG SIGUSR1 on file descriptor F_GETSIG result: 10 calling SETLEASE on file descriptor F_GETSIG result: 10 calling SETLEASE on file descriptor F_GETSIG result: 0 done fcntl(F_SETLEASE) resets the signal number when it is called the second time, which is surprising, undocumented and thus very likely to be a bug. Note that as a result, SIGIO is raised instead of SIGUSR1, which breaks applications. Most importantly, it breaks Samba, which first calls F_SETSIG, then F_SETLEASE, if it fails with EACCES, it elevates its capabilities then tries again, if it succeeds, SIGIO will eventually be raised instead of the wanted signal, Samba doesn't handle unexpected SIGIO, crashes, game over. (see linux_setlease() in Samba source code source3/smbd/oplock_linux.c) As a note to Samba users, a workaround for this is to disable kernel oplocks (kernel oplocks = no in smb.conf).
As a related note, I also reported this in the Samba bugtracker as #8974: https://bugzilla.samba.org/show_bug.cgi?id=8974
Hi, This seems to be a regression introduced on version 2.6.37 of the kernel. I narrowed it down to these two commits causing the regression: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=3df057ac9afe83c4af84016df3baf3a0eb1d3d33 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=8896b93f42459b18b145c69d399b62870df48061 Building the kernel up to the parent commit of the first of this pair (commit 21b75b019983dfa5c2dda588f4b60b4ca69844a4) doesn't reproduce the problem, the last F_GETSIG still returns 10... With the first commit only there's a bug in which the file_lock struct returned by __vfs_setlease() is the one being freed instead of the one just allocated (which needs to be freed as the other one is being reused.) Running the test case under the kernel up to that commit will crash the kernel with BUG at fs/locks.c:193 which is the check for the fl->fl_link being empty in locks_free_lock(). With both commits then the leasesig.c program will reproduce the issue, the last F_GETSIG will return 0, so it's clear that this pair of commits introduced the problem, the first kernel version they've made into was 2.6.37. The problem seems to be that, when a previous lease already existed, the recently allocated one needs to be freed, in doing so locks_free_lock() will be called which will call lease_release_private_callback() which sets signo back to 0... Not sure yet what the correct solution for this issue is, I'll have a closer look and try to come up with a patch to fix it, I'll let you know when I have something worth sharing. Cheers, Filipe
Patch posted to mailing list in this thread: http://thread.gmane.org/gmane.linux.kernel/1314059 A test with leasesig.c showed that it seems to fix the issue: $ uname -r 3.5.0-rc2-bug43336+ $ ./leasesig test opening test in read-only mode calling SETSIG SIGUSR1 on file descriptor F_GETSIG result: 10 calling SETLEASE on file descriptor F_GETSIG result: 10 calling SETLEASE on file descriptor F_GETSIG result: 10 done Cheers, Filipe
A patch referencing this bug report has been merged in Linux v3.6-rc1: commit 3b6e2723f32de42028617f2c99b244ccd72cd959 Author: Filipe Brandenburger <filbranden@gmail.com> Date: Fri Jul 27 00:42:52 2012 -0400 locks: prevent side-effects of locks_release_private before file_lock is initialized
A patch referencing a commit referencing this bug report has been merged in Linux v3.6-rc1: commit 068535f1fef4c90aee23eb7b9b9a71c5b72d7cd0 Author: J. Bruce Fields <bfields@redhat.com> Date: Wed Aug 1 07:56:16 2012 -0400 locks: remove unused lm_release_private