Bug 218382
Summary: | Possible discrepancy in CREATE_SESSION slot number accounting | ||
---|---|---|---|
Product: | File System | Reporter: | Connor Smith (connor.smith) |
Component: | NFS | Assignee: | Chuck Lever (cel) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | jlayton |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: |
Description
Connor Smith
2024-01-16 14:30:52 UTC
I originally sent the above to the linux-nfs mailing list, where Chuck Lever replied...
> On first blush, your interpretation of S18.36.4 looks correct
> to me. We need to study this further and also look into how
> pynfs treats CREATE_SESSION retransmits.
... and suggested that I file this bug report.
There is possible ambiguity in Section 18.36.4. There is a single CREATE_SESSION slot and sequence number per client. Thus it would be a short leap to assume that these are part of the "client record" that is frequently referred to in this section. The description of Phase 2 of the CREATE_SESSION operation says: > If csa_sequenceid is equal to the slot's sequence ID + 1 (accounting > for wraparound), then the slot's sequence ID is set to csa_sequenceid, > and the CREATE_SESSION processing goes to the next phase. Which can be interpreted as an update to that client's record. But later in Phase 3, the description of "Successful Confirmation" says: > If the session is not successfully created, then no changes are made > to any client records on the server. and the description of "Unsuccessful Confirmation" likewise says: > Neither of these cases is permissible. Processing stops and > NFS4ERR_CLID_INUSE is returned to the client. No changes are made to > any client records on the server. It's possible that Bruce read this to mean that the slot sequence ID, as part of the client's record, is not updated in these two cases (including NFS4ERR_CLID_INUSE). There is growing consensus that Section 18.35.4 clearly defines a client record, and it does /not/ contain any session-related details. I've prototyped a fix for this issue, but now NFSD FAILS pynfs CSESS6. This test checks whether a CREATE_SESSION that fails with NFS4ERR_CLID_INUSE is cached by sending a CREATE_SESSION with the wrong credential (leaving the client ID unconfirmed) and then replaying that CREATE_SESSION using the same CS slot sequence number. This means that either the test is wrong or our understanding of the spec is wrong. A colleague confirmed that the Solaris NFS server passes CSESS6. Given the way the implementation guidance in RFC 8881 Section 18.36.4 is broken into phases and states the error code behavior in terms of BCP14 compliance keywords, I still believe that SEQ_MISORDERED is the correct error code. The sequence number check is supposed to be done first, so that: 1. The first CREATE_SESSION, even though unsuccessful, still advances the CS slot sequence number on the server. 2. The second CREATE_SESSION will fail early because the client presents an old slot sequence number. Fixed by commit e4469c6cc69b ("NFSD: Fix the NFSv4.1 CREATE_SESSION operation") and 99dc2ef0397d ("NFSD: CREATE_SESSION must never cache NFS4ERR_DELAY replies"), both merged in v6.9-rc. |