Bug 200273

Summary: F_OFD_GETLK implemented wrong with CIFS protocol versions higher than 1.0 (breaks use of disk image on CIFS share with qemu 2.10+)
Product: File System Reporter: Adam Williamson (adamw)
Component: CIFSAssignee: fs_cifs (fs_cifs)
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.13+ Subsystem:
Regression: No Bisected commit-id:

Description Adam Williamson 2018-06-25 17:13:08 UTC
This is reported downstream (Fedora) at https://bugzilla.redhat.com/show_bug.cgi?id=1484130 . The bug experienced by a user is that trying to run a qemu VM with a disk image on a CIFS share with qemu 2.10+ fails, with an error 'Failed to get "write" lock'. (This only happens with qemu 2.10+ as the locking behaviour was new in that qemu version). Fan Zheng showed in https://bugzilla.redhat.com/show_bug.cgi?id=1484130#c8 (and elaborated in https://bugzilla.redhat.com/show_bug.cgi?id=1484130#c37 ) that this is really a kernel-side bug in the cifs driver: it does not implement F_OFD_GETLK the way it is supposed to behave. To quote Fan:

"It is a kernel bug. The code snippet in comment 8 shows clearly that the kernel is doing the wrong thing, which cannot be fixed/worked around by QEMU.

In man 2 fcntl:

       F_OFD_GETLK (struct flock *)
              On input to this call, lock describes an open file description lock we would like to place on the file.  If the lock could  be  placed,  fcntl()  does  not
              actually  place  it,  but  returns F_UNLCK in the l_type field of lock and leaves the other fields of the structure unchanged.  If one or more incompatible
              locks would prevent this lock being placed, then details about one of these locks are returned via lock, as described above for F_GETLK.

which is not the case with the new CIFS behaviour."

We started noticing this issue after commit eef914a9 , the one which changed the default protocol version for cifs from 1.0 to 3.0. Testing indicates the bug does happen with protocol version 2.0 as well. So we figure this is wrong with protocol 2.0+ but not 1.0.