Bug 43336

Summary: fcntl(F_SETLEASE) resets signal number when called multiple times
Product: File System Reporter: Etienne Dechamps (etienne)
Component: VFSAssignee: fs_vfs
Status: RESOLVED CODE_FIX    
Severity: normal CC: alan, bjoern, filbranden, florian
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.2.10 Subsystem:
Regression: No Bisected commit-id:
Attachments: Test program

Description Etienne Dechamps 2012-06-03 22:16:57 UTC
Created attachment 73503 [details]
Test program

See the attached C program:

    $ gcc -o leasesig leasesig.c 
    $ ./leasesig foo
    opening foo in read-only mode
    calling SETSIG SIGUSR1 on file descriptor
    F_GETSIG result: 10
    calling SETLEASE on file descriptor
    F_GETSIG result: 10
    calling SETLEASE on file descriptor
    F_GETSIG result: 0
    done

fcntl(F_SETLEASE) resets the signal number when it is called the second time, which is surprising, undocumented and thus very likely to be a bug.

Note that as a result, SIGIO is raised instead of SIGUSR1, which breaks applications. Most importantly, it breaks Samba, which first calls F_SETSIG, then F_SETLEASE, if it fails with EACCES, it elevates its capabilities then tries again, if it succeeds, SIGIO will eventually be raised instead of the wanted signal, Samba doesn't handle unexpected SIGIO, crashes, game over. (see linux_setlease() in Samba source code source3/smbd/oplock_linux.c)

As a note to Samba users, a workaround for this is to disable kernel oplocks (kernel oplocks = no in smb.conf).
Comment 1 Etienne Dechamps 2012-06-03 22:41:05 UTC
As a related note, I also reported this in the Samba bugtracker as #8974: https://bugzilla.samba.org/show_bug.cgi?id=8974
Comment 2 Filipe Brandenburger 2012-06-13 04:14:25 UTC
Hi,

This seems to be a regression introduced on version 2.6.37 of the kernel.

I narrowed it down to these two commits causing the regression:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=3df057ac9afe83c4af84016df3baf3a0eb1d3d33
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=8896b93f42459b18b145c69d399b62870df48061

Building the kernel up to the parent commit of the first of this pair (commit 21b75b019983dfa5c2dda588f4b60b4ca69844a4) doesn't reproduce the problem, the last F_GETSIG still returns 10...

With the first commit only there's a bug in which the file_lock struct returned by __vfs_setlease() is the one being freed instead of the one just allocated (which needs to be freed as the other one is being reused.) Running the test case under the kernel up to that commit will crash the kernel with BUG at fs/locks.c:193 which is the check for the fl->fl_link being empty in locks_free_lock().

With both commits then the leasesig.c program will reproduce the issue, the last F_GETSIG will return 0, so it's clear that this pair of commits introduced the problem, the first kernel version they've made into was 2.6.37.

The problem seems to be that, when a previous lease already existed, the recently allocated one needs to be freed, in doing so locks_free_lock() will be called which will call lease_release_private_callback() which sets signo back to 0...

Not sure yet what the correct solution for this issue is, I'll have a closer look and try to come up with a patch to fix it, I'll let you know when I have something worth sharing.

Cheers,
Filipe
Comment 3 Filipe Brandenburger 2012-06-16 03:09:29 UTC
Patch posted to mailing list in this thread:
http://thread.gmane.org/gmane.linux.kernel/1314059

A test with leasesig.c showed that it seems to fix the issue:

$ uname -r
3.5.0-rc2-bug43336+
$ ./leasesig test 
opening test in read-only mode
calling SETSIG SIGUSR1 on file descriptor
F_GETSIG result: 10
calling SETLEASE on file descriptor
F_GETSIG result: 10
calling SETLEASE on file descriptor
F_GETSIG result: 10
done

Cheers,
Filipe
Comment 4 Florian Mickler 2012-08-04 19:08:57 UTC
A patch referencing this bug report has been merged in Linux v3.6-rc1:

commit 3b6e2723f32de42028617f2c99b244ccd72cd959
Author: Filipe Brandenburger <filbranden@gmail.com>
Date:   Fri Jul 27 00:42:52 2012 -0400

    locks: prevent side-effects of locks_release_private before file_lock is initialized
Comment 5 Florian Mickler 2012-08-05 11:17:53 UTC
A patch referencing a commit referencing this bug report has been merged in Linux v3.6-rc1:

commit 068535f1fef4c90aee23eb7b9b9a71c5b72d7cd0
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Wed Aug 1 07:56:16 2012 -0400

    locks: remove unused lm_release_private