Bug 212295 - pipe deadlocks since kernel v5.8 after resizing (race condition)
Summary: pipe deadlocks since kernel v5.8 after resizing (race condition)
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-15 18:00 UTC by Lukas Schauer
Modified: 2022-06-20 13:52 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.8-latest
Tree: Mainline
Regression: Yes


Attachments
Code to reproduce the issue (1.25 KB, text/x-csrc)
2021-03-15 18:00 UTC, Lukas Schauer
Details
Patch fixing the race condition (1022 bytes, application/mbox)
2021-03-16 03:14 UTC, Lukas Schauer
Details
[PATCH] fs/pipe: wakeup wr_wait after setting max_usage (1.53 KB, application/mbox)
2021-03-24 14:29 UTC, Lukas Schauer
Details

Description Lukas Schauer 2021-03-15 18:00:06 UTC
Created attachment 295871 [details]
Code to reproduce the issue

Hi,

I've been experiencing some weird bugs with pipes sometimes being stuck in a deadlock since kernel v5.8 if they are being resized.

A child process is stuck in pipe_read:

  [<0>] pipe_read+0x2ca/0x410
  [<0>] new_sync_read+0x18d/0x1a0
  [<0>] vfs_read+0xf1/0x180
  [<0>] ksys_read+0xb5/0xd0
  [<0>] do_syscall_64+0x33/0x80
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

While the parent process is stuck in the corresponding pipe_write:

  [<0>] pipe_write+0x274/0x5c0
  [<0>] new_sync_write+0x19c/0x1b0
  [<0>] vfs_write+0x184/0x250
  [<0>] ksys_write+0xb5/0xd0
  [<0>] do_syscall_64+0x33/0x80
  [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

The bug is only triggered if pipes get resized, which seemingly very little processes actually do.

A git bisect landed on the following commit:

  commit c73be61cede5882f9605a852414db559c0ebedfd                           
  Author: David Howells <dhowells@redhat.com>                       
  Date:   Tue Jan 14 17:07:11 2020 +0000                                                
                                                                                      
    pipe: Add general notification queue support

I've attached some code that reproduces the bug for me (may take a few hundred loops). Removing the fcntl for F_SETPIPE_SZ removes the pipe_read/write deadlocks, so I guess the bug is somewhere in the resizing logic.
Comment 1 Christian Schwarz 2021-03-15 22:49:51 UTC
I can reproduce the issue using the provided code.
Comment 2 Lukas Schauer 2021-03-16 03:14:14 UTC
Created attachment 295881 [details]
Patch fixing the race condition

I've found the race condition.

After resizing a pipe a wakeup is issued for pipe_write, before actually raising the max_usage value for that pipe.

Depending on wether the pipe was full before resizing or not this could result in a deadlock situation.

I've attached a patch for this to this issue. It's build against v5.8 because that's what I've been using for testing. If necessary please let me know and I'll rebase it for a newer version.
Comment 3 Lukas Schauer 2021-03-24 14:29:42 UTC
Created attachment 296031 [details]
[PATCH] fs/pipe: wakeup wr_wait after setting max_usage

I revised the patch to better address the regression instead of weirdly pasting code around and also sent it to the linux-kernel mailing list with Alan Cox and David Howells in Cc.
Comment 4 John Goerzen 2022-06-20 13:42:28 UTC
What is the current status of getting this merged?  I recently encountered it in the wild.  Thanks.

Note You need to log in before you can comment on or make changes to this bug.