Bug 5956 - Linux client sets block=false in NLM_CANCEL requests
Summary: Linux client sets block=false in NLM_CANCEL requests
Status: CLOSED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Trond Myklebust
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-25 07:38 UTC by Jean-Louis ROCHETTE
Modified: 2006-03-04 13:58 UTC (History)
0 users

See Also:
Kernel Version: 2.6.9-1.667smp
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Fix arguments to NLM_CANCEL call (2.64 KB, patch)
2006-01-25 09:16 UTC, Trond Myklebust
Details | Diff

Description Jean-Louis ROCHETTE 2006-01-25 07:38:41 UTC
Most recent kernel where this bug did not occur:
Distribution: Fedora Core release 3 (Heidleberg)
Hardware Environment:
Software Environment: File system mounted with "mount -t nfs". No specific 
option.
Problem Description: A process with a BLOCKED range lock request is killed. It 
sends a NLM_CANCEL request to the server to cancel the blocked range lock 
request. The issue is that it sets "block=false" in the NLM_CANCEL request. 
Thus, the cancel req doesn't match the pending lock req. which has block=true. 
The server replies the NLM_CANCEL with DENIED status.
The NLM protocol says that block field in NLM_CANCEL must be set to true.

Steps to reproduce: I use fcntl() to take the blocking range lock from 2 
different processes. Then I use "kill -9 <pid>" to kill the process with the 
blocking lock request.

Frame 1646 (266 bytes on wire, 266 bytes captured)
Internet Protocol, Src: 10.64.220.148 , Dst: 192.168.8.92
Transmission Control Protocol, Src Port: 796 , Dst Port: 59915
Remote Procedure Call, XID:0xfbd662da    // XID++ at each retry
Network Lock Manager Protocol
    V4 Procedure: LOCK (2)
    cookie: <DATA>
        length: 4
        contents: <DATA> // 0x35120000
    block: Yes
    exclusive: Yes
    lock
        caller_name: newlnxjlr
        fh
        owner: <DATA>
            contents: <DATA>   // "5065@newlnxjlr"
        svid: 5065
        l_offset: 10
        l_len: 20
    reclaim: No

Frame 1647 (106 bytes on wire, 106 bytes captured)
Internet Protocol, Src: 192.168.8.92 , Dst: 10.64.220.148
Transmission Control Protocol, Src Port: 59915 , Dst Port: 796
Remote Procedure Reply XID:0xfbd662da
Network Lock Manager Protocol
    V4 Procedure: LOCK (2)
    cookie: <DATA>
        length: 4
        contents: <DATA> // 0x35120000 -> 0x1235
    stat: NLM_BLOCKED (3)

Frame 1648 (258 bytes on wire, 258 bytes captured)
Internet Protocol, Src: 10.64.220.148 , Dst: 192.168.8.92
Transmission Control Protocol, Src Port: 796 , Dst Port: 59915
Remote Procedure Call, XID:0xfcd662da
Network Lock Manager Protocol
    V4 Procedure: CANCEL (3)
    cookie: <DATA>
        length: 4
        contents: <DATA> // 0x36120000 -> 0x1236
    block: No   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
    exclusive: Yes
    lock
        caller_name: newlnxjlr
        fh
        owner: <DATA>
            contents: <DATA>   // "5065@newlnxjlr"
        svid: 5065
        l_offset: 10
        l_len: 20

Frame 1649 (106 bytes on wire, 106 bytes captured)
Internet Protocol, Src: 192.168.8.92 , Dst: 10.64.220.148
Transmission Control Protocol, Src Port: 59915 , Dst Port: 796
Remote Procedure Reply XID:0xfcd662da
Network Lock Manager Protocol
    V4 Procedure: CANCEL (3)
    cookie: <DATA>
        length: 4
        contents: <DATA> // 0x36120000 -> 0x1236
    stat: NLM_DENIED (1)
Comment 1 Diego Calleja 2006-01-25 08:29:35 UTC
Could you try 2.6.15?
Comment 2 Trond Myklebust 2006-01-25 09:16:22 UTC
Created attachment 7142 [details]
Fix arguments to NLM_CANCEL call

 The OpenGroup docs state that the arguments "block", "exclusive" and
 "alock" must exactly match the arguments for the lock call that we are
 trying to cancel.
 Currently, "block" is always set to false, which is wrong.

 See bug# 5956 on bugzilla.kernel.org.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Comment 3 Trond Myklebust 2006-03-04 13:58:25 UTC
Patch was applied to 2.6.16-rc2

Note You need to log in before you can comment on or make changes to this bug.