Detailed description of the bug: https://lutz.donnerhacke.de/eng/Blog/Groundhog-Day-with-SMB-remount
Summary of the race condition:
1) Daisy chaining scheduling creates a gap.
2) If traffic comes unfortunate shortly after
the last echo, the planned echo is suppressed.
3) Due to the gap, the next echo transmission is delayed
until after the timeout, which is set hard to twice
the echo interval.
a) Eliminate the gap by scheduling at fixed times.
b) Send the echo requests in any case regardless of other traffic.
c) Avoiding the entire problem by waiting at least three times
the interval length.
For a quick fix change the 2 to a 3 in the code:
I was advised to open a similar ticket at the samba side: https://bugzilla.samba.org/show_bug.cgi?id=13795
Is there anything unclear in this ticket?
What are the reasons to deny the recommended quick fix?
Do you need a patch file?
I seems you have done quite an investigation. I sure would like to test it, but it not easy to get a patched Ubuntu kernel.
Could it be 202903 is a duplicate of this bug?
Ronnie posted a fix for this onto linux-cifs mailing list which I have tentatively merged into cifs-2.6.git for-next
Let me know if this works ok for you
If you are referring to https://marc.info/?l=linux-cifs&m=156235998723334&w=2 the patch looks promising. It follows the "quick fix" proposal and AFAIK the problem vanishes using this patch.
But the patch for the comment section is wrong. Please keep the original text.
Correct comment would be:
712 * We need to wait 3 echo intervals to make sure we handle such
713 * situations right:
714 * 1s client sends a normal SMB request
715 * 2s client gets a response
716 * 30s echo workqueue job pops, and decides we got a response recently
712 refers to the wait interval.
715 refers to the hard coded delay (one second)
I have sent a fix to the mailinglist to update the comment.
Will close bug as we have the code change in upstream now.