Detailed description of the bug: https://lutz.donnerhacke.de/eng/Blog/Groundhog-Day-with-SMB-remount Summary of the race condition: 1) Daisy chaining scheduling creates a gap. 2) If traffic comes unfortunate shortly after the last echo, the planned echo is suppressed. 3) Due to the gap, the next echo transmission is delayed until after the timeout, which is set hard to twice the echo interval. Possible solutions: a) Eliminate the gap by scheduling at fixed times. b) Send the echo requests in any case regardless of other traffic. c) Avoiding the entire problem by waiting at least three times the interval length. For a quick fix change the 2 to a 3 in the code: https://github.com/torvalds/linux/blob/master/fs/cifs/connect.c#L712
I was advised to open a similar ticket at the samba side: https://bugzilla.samba.org/show_bug.cgi?id=13795
Is there anything unclear in this ticket? What are the reasons to deny the recommended quick fix? Do you need a patch file?
Hi Lutz, I seems you have done quite an investigation. I sure would like to test it, but it not easy to get a patched Ubuntu kernel. Could it be 202903 is a duplicate of this bug?
Ronnie posted a fix for this onto linux-cifs mailing list which I have tentatively merged into cifs-2.6.git for-next Let me know if this works ok for you
Thank you. If you are referring to https://marc.info/?l=linux-cifs&m=156235998723334&w=2 the patch looks promising. It follows the "quick fix" proposal and AFAIK the problem vanishes using this patch. But the patch for the comment section is wrong. Please keep the original text.
Correct comment would be: 712 * We need to wait 3 echo intervals to make sure we handle such 713 * situations right: 714 * 1s client sends a normal SMB request 715 * 2s client gets a response 716 * 30s echo workqueue job pops, and decides we got a response recently 712 refers to the wait interval. 715 refers to the hard coded delay (one second)
I have sent a fix to the mailinglist to update the comment. Will close bug as we have the code change in upstream now.