Attached is a trace which results in data corruption. The client sends a sync WRITE immediately before connection loss. In this case, the server erroneously sends the NFS4_OK response immediately upon reconnect, which causes the client to release the write buffer for reuse. The client application then overwrites the buffer, but the kernel is still in the middle of an RPC resend of the original write, which gets corrupted. Trond notes that this is a known longstanding bug, so documenting it here. Despite this being triggered by server misbehavior, it is still a potential issue when using O_DIRECT writes whenever an RPC-resend is triggered. Trond notes that this race would be hard to fix, as it would involve communication between layers that currently is not done. One potential solution would be, in the case of O_DIRECT, close the connection in situations where the client would otherwise do an RPC-resend.
Created attachment 27811 [details] Trace immediately before connection loss Client is in the middle of a write, several WRITEs are outstanding without reply from server when connection is lost
What should the server be doing? (I can't get wireshark to parse that trace for me, by the way.)
Created attachment 27821 [details] Trace immediately after connection loss Server immediately sends back (unsolicited) OK to all outstanding WRITEs. The client soon sends a corrupted RPC replay of one of the outstanding WRITEs (xid=15..., but with data from a future WRITE xid=28). The client then starts another corrupted RPC replay of a WRITE (xid=22, data from future WRITE xid=29), but the connection is broken again before that is completely sent.
(In reply to comment #2) > What should the server be doing? > > (I can't get wireshark to parse that trace for me, by the way.) The server should wait for the client to send the replay, and reply to that. (Note the trace is the linux client, but not the linux server.) Regarding parsing, I also find it does not parse unless I go into the wireshark menu Analyze->"decode as" and change DCE-RPC to RPC. I am not sure why that is necessary.