Bug 16911 - Data corruption due to race between rpc-resend of O_DIRECT WRITE and server OK response.
Summary: Data corruption due to race between rpc-resend of O_DIRECT WRITE and server O...
Status: RESOLVED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Trond Myklebust
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-24 14:54 UTC by Fred Isaman
Modified: 2012-08-13 15:59 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Trace immediately before connection loss (214.01 KB, application/octet-stream)
2010-08-24 14:56 UTC, Fred Isaman
Details
Trace immediately after connection loss (131.22 KB, application/octet-stream)
2010-08-24 15:03 UTC, Fred Isaman
Details

Description Fred Isaman 2010-08-24 14:54:05 UTC
Attached is a trace which results in data corruption.  The client
sends a sync WRITE immediately before connection loss.  In this case,
the server erroneously sends the NFS4_OK response immediately upon
reconnect, which causes the client to release the write buffer for
reuse.  The client application then overwrites the buffer, but the
kernel is still in the middle of an RPC resend of the original write,
which gets corrupted.

Trond notes that this is a known longstanding bug, so documenting it here.
Despite this being triggered by server misbehavior, it is still a
potential issue when using O_DIRECT writes whenever an RPC-resend is
triggered. Trond notes that this race would be hard to fix, as it
would involve communication between layers that currently is not done.
One potential solution would be, in the case of O_DIRECT, close the
connection in situations where the client would otherwise do an RPC-resend.
Comment 1 Fred Isaman 2010-08-24 14:56:31 UTC
Created attachment 27811 [details]
Trace immediately before connection loss

Client is in the middle of a write, several WRITEs are outstanding without reply from server when connection is lost
Comment 2 bfields 2010-08-24 14:59:23 UTC
What should the server be doing?

(I can't get wireshark to parse that trace for me, by the way.)
Comment 3 Fred Isaman 2010-08-24 15:03:26 UTC
Created attachment 27821 [details]
Trace immediately after connection loss

Server immediately sends back (unsolicited) OK to all outstanding WRITEs.  The client soon sends a corrupted RPC replay of one of the outstanding WRITEs (xid=15..., but with data from a future WRITE xid=28).  The client then starts another corrupted RPC replay of a WRITE (xid=22, data from future WRITE xid=29), but the connection is broken again before that is completely sent.
Comment 4 Fred Isaman 2010-08-24 15:10:20 UTC
(In reply to comment #2)
> What should the server be doing?
> 
> (I can't get wireshark to parse that trace for me, by the way.)

The server should wait for the client to send the replay, and reply to that.  (Note the trace is the linux client, but not the linux server.)

Regarding parsing, I also find it does not parse unless I go into the wireshark menu Analyze->"decode as" and change DCE-RPC to RPC.  I am not sure why that is necessary.

Note You need to log in before you can comment on or make changes to this bug.