Created attachment 295871 [details]
Code to reproduce the issue
I've been experiencing some weird bugs with pipes sometimes being stuck in a deadlock since kernel v5.8 if they are being resized.
A child process is stuck in pipe_read:
While the parent process is stuck in the corresponding pipe_write:
The bug is only triggered if pipes get resized, which seemingly very little processes actually do.
A git bisect landed on the following commit:
Author: David Howells <firstname.lastname@example.org>
Date: Tue Jan 14 17:07:11 2020 +0000
pipe: Add general notification queue support
I've attached some code that reproduces the bug for me (may take a few hundred loops). Removing the fcntl for F_SETPIPE_SZ removes the pipe_read/write deadlocks, so I guess the bug is somewhere in the resizing logic.
I can reproduce the issue using the provided code.
Created attachment 295881 [details]
Patch fixing the race condition
I've found the race condition.
After resizing a pipe a wakeup is issued for pipe_write, before actually raising the max_usage value for that pipe.
Depending on wether the pipe was full before resizing or not this could result in a deadlock situation.
I've attached a patch for this to this issue. It's build against v5.8 because that's what I've been using for testing. If necessary please let me know and I'll rebase it for a newer version.
Created attachment 296031 [details]
[PATCH] fs/pipe: wakeup wr_wait after setting max_usage
I revised the patch to better address the regression instead of weirdly pasting code around and also sent it to the linux-kernel mailing list with Alan Cox and David Howells in Cc.
What is the current status of getting this merged? I recently encountered it in the wild. Thanks.
I was recently added to this report. Does anyone still care? Then it would be good if that person could confirm that this still happens with mainline.
Side note, there is currently another patch that fixes some problem the culprit mentioned above introduced, but it might be totally unrelated.
Yes, I do still care. I have to maintain ugly and imperfect workarounds in various shell scripts; ie:
command1 | (sleep 2; command2)
I don't *think* that the patch you reference fixes this, because I don't believe splice is required to trigger this bug. But I am not certain of this.
A fix for this is now in:
I see no need to expedite this since this has been around since v5.8. So this would be fixed when vfs.misc is merged during the v6.8 merge window.
Christian, many thx for taking care of this. And FWIW, I totally agree wrt to "no need to expedite this".
Side note: when Sam James added me to this he also added me to two other regressions wrt to pipes & splice:
Those are even older (and the second one might not really qualify as regression due to a vendor kernel as base) and I have no idea if they still happen yet. Just thought you might wanted to know.