Bug 216458 - SIGIO raised with si_code==POLL_IN on pipe FD with no writers when 0 bytes are in input buffer, no SIGIO with si_code==POLL_HUP
Summary: SIGIO raised with si_code==POLL_IN on pipe FD with no writers when 0 bytes ar...
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 high
Assignee: fs_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-09-07 12:51 UTC by Jason Vas Dias
Modified: 2023-07-07 23:20 UTC (History)
4 users (show)

See Also:
Kernel Version: v5.19.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
patch against fs/pipe.c of v5.19.6 (1.06 KB, patch)
2022-09-07 13:08 UTC, Jason Vas Dias
Details | Diff
improved patch to raise SIGIO with si_code==POLL_HUP in fs/pipe.c (1.28 KB, patch)
2022-09-07 13:30 UTC, Jason Vas Dias
Details | Diff
improved patch to raise SIGIO with si_code==POLL_HUP in fs/pipe.c (1.39 KB, patch)
2022-09-07 14:17 UTC, Jason Vas Dias
Details | Diff
improved patch to raise SIGIO with si_code==POLL_HUP in fs/pipe.c (1.69 KB, patch)
2022-09-07 15:47 UTC, Jason Vas Dias
Details | Diff
testcase : prints FAILURE under unmodified v5.19.6 (5.47 KB, text/plain)
2022-09-07 18:16 UTC, Jason Vas Dias
Details
improved patch to raise SIGIO with si_code==POLL_HUP in fs/pipe.c (2.69 KB, patch)
2022-09-07 20:17 UTC, Jason Vas Dias
Details | Diff
testcase: t_sigio.c (5.87 KB, text/x-csrc)
2022-09-09 14:42 UTC, Jason Vas Dias
Details
Patch against RHEL8 (Rocky Linux) v4.18.0-372-26.1 / Stable Tree (3.33 KB, patch)
2022-10-04 12:24 UTC, Jason Vas Dias
Details | Diff

Description Jason Vas Dias 2022-09-07 12:51:16 UTC
I and probably many other users are having problems
detecting with precision when the write end of a
readable pipe file descriptor is closed - ie. 
a read-only fd for which 
  S_ISFIFO(st.st_mode) 
is true after fstat(fd, &st), which has 1 writing process
- using only SIGIO events, when I believe it should 
be possible to do so with a very simple, trivial fix 
to linux's
  fs/pipe.c
, which does NOT under any circumstances raise POLL_HUP,
even when it detects there are no writers on a readable
pipe fd, or it has detected it has written NO bytes
to the input buffer - in these cases, a SIGIO with
si_code==POLL_IN is raised, when NO bytes are in
the input buffer, and NO si_code==POLL_HUP SIGIO is
raised; then users must use:
  ioctl(fd, FIONREAD, &sz),
which in these cases returns 0 bytes - imagine,
a POLL_IN raised for 0 bytes in input buffer, 
when it has detected the pipe has no writers,
instead of POLL_HUP !  This is really wrong / bad .

There is then a race condition - how to detect if
the writer program has written its final chunk of
output and closed the pipe, so that only ONE
final SIGIO is received by the writer, for a
non-zero size in the input buffer, NOT the
one with no data - then no final SIGIO will
be received and the program can hang forever
in pause(2) waiting for input on a pipe with no writers.

The poll(2) event EPOLLHUP is set in revents IFF one
uses poll(2), (fs/pipe.c's pipe_poll() function),
which I am not - my program just receives SIGIO signals,
in a signal handler with SA_SIGINFO | SA_ONSTACK
sigaction flags set, so it gets a siginfo_t structure
with si_code set to one of (from sigaction manual page):
"
       The  following values can be placed in si_code for a SIGIO/SIGPOLL sig‐
       nal:

           POLL_IN
                  Data input available.

           POLL_OUT
                  Output buffers available.

           POLL_MSG
                  Input message available.

           POLL_ERR
                  I/O error.

           POLL_PRI
                  High priority input available.

           POLL_HUP
                  Device disconnected.

"

 So, IMHO, fs/pipe.c's failure to raise ANY 'POLL_HUP' SIGIO
 is in violation of this manual page promise to raise POLL_HUP
 SIGIOs whenever a 'Device disconnected' event occurs - 
 I suggest a read-only pipe FD with no writers should fit
 into that category.

 Furthermore, I believe the final POLL_IN SIGIO raised by
 poll_write, even when it detects it wrote 0 bytes 
 to the output buffer, is incorrect and against all 
 extant documentation on POLL_IN : 
   POLL_IN must ONLY be raised when input is actually
   available in the input buffer .

 Also, if POLL_IN detects there are no writers, why is it
 raising POLL_OUT ?
 
 Please, improve fs/pipe.c for Linux 6.0 - see:
  https://lore.kernel.org/lkml/hh8rmxiims.fsf@jvdspc.jvds.net/

 I am appending a patch today that fixes this for Fedora-36 / Rawhide -
 the latest v5.19.7 tarball from kernel.org with Fedora patches from
 v5.19.4 applied.
Comment 1 Jason Vas Dias 2022-09-07 13:08:45 UTC
Created attachment 301761 [details]
patch against fs/pipe.c of v5.19.6

This fixes fs/pipe.c to raise SIGIO with si_code==POLL_HUP
if there are async_readers and !pipe->writers in poll_read(),
or if no bytes were written and the pipe buffer is empty
and there are now no writers in pipe_write() .
Comment 2 Jason Vas Dias 2022-09-07 13:12:36 UTC
Building & testing this patch against v5.19.7 built
by rpmbuild with v5.18.6 Fedora spec file now -
will post back testing results.
Comment 3 Jason Vas Dias 2022-09-07 13:30:00 UTC
Created attachment 301762 [details]
improved patch to raise SIGIO with si_code==POLL_HUP in fs/pipe.c
Comment 4 Jason Vas Dias 2022-09-07 14:17:35 UTC
Created attachment 301763 [details]
improved patch to raise SIGIO with si_code==POLL_HUP in fs/pipe.c

This is final patch being built & tested with Fedora 36 derived spec file
(--with dbgonly).
Comment 5 Jason Vas Dias 2022-09-07 14:29:33 UTC
Whoops! : build log contains:
> + /usr/bin/make -s 'HOSTCFLAGS=-O2  -fexceptions -g -grecord-gcc-switches
> -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
> -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
> -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 
> -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection
> -fcf-protection' 'HOSTLDFLAGS=-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now
> -specs=/usr/lib/rpm/redhat/redhat-hardened-ld
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1
> -Wl,-dT,/home/jvd/rpmbuild/BUILD/kernel-5.19.7/.package_note-kernel-5.19.7-200.fc36.x86_64.ld'
> ARCH=x86_64 KCFLAGS= WITH_GCOV=0 -j12 bzImage
fs/pipe.c: In function 'pipe_read':
fs/pipe.c:385:9: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
  385 |         register bool empty;
      |         ^~~~~~~~
+ '[' 1 -eq 1 ']'

I was unaware kernel must compile with -std=c90 .
I will move the declaration of 'empty' to the declarations header of pipe_write().
Comment 6 Jason Vas Dias 2022-09-07 15:47:53 UTC
Created attachment 301765 [details]
improved patch to raise SIGIO with si_code==POLL_HUP in fs/pipe.c
Comment 7 Jason Vas Dias 2022-09-07 15:58:42 UTC
First build ran out of disk space, so am building again
with latest patch applied. 

I will run kernel tools/testing/selftests , The Linux Test Project (LTP),
and the Fedora kernel regression test suite (https://pagure.io/kernel-tests.git),
and my own test program, which I will append here - the problem
was found with an I/O library I was developing ...
I will cut out of that I/O library just the SIGIO code,
which just listens for SIGIO events for a (pipe) FD and 
and responds to SIGIO with si_code set to one of the
values documented in the manual page - the test will
verify that it now will ALWAYS get a SIGIO with si_code==POLL_HUP
when the single writer of its pipe input fd is killed / closes
the FD .
Comment 8 Jason Vas Dias 2022-09-07 18:16:08 UTC
Created attachment 301768 [details]
testcase : prints FAILURE under unmodified v5.19.6



$ gcc -o t_sigio t_sigio.c 
$ ./t_sigio
t_sigio[2002797]: 336339 Listening to 'cat' (stdin) on pipe fd: 3 with SIGIO....
t_sigio[2002797]: 0 bytes available.
t_sigio[2002797]: Alarmed: killing cat pid 2002798.
t_sigio[2002797]: 0 bytes available.
t_sigio[2002797]: 1131733499: 1131731216: 2283 : SIGIO #1 : (1 , 41).
t_sigio[2002797]: 0 bytes available.
t_sigio[2002797]: FAILURE.

So all that happens when the process kills the 'cat' pid it listens to
is that it gets a SIGIO with 'si_code' set to POLL_IN, when
NO BYTES ARE AVAILABLE (ioctl(fd, FIONREAD, &n_rd) returns with n_rd==0).

It SHOULD get a SIGIO with POLL_HUP in si_code.
Comment 9 Jason Vas Dias 2022-09-07 18:21:09 UTC
The build is finished now, RPMs produced.

I will install this debug v5.19.7 kernel RPM and reboot later & test .
Comment 10 Jason Vas Dias 2022-09-07 20:17:16 UTC
Created attachment 301769 [details]
improved patch to raise SIGIO with si_code==POLL_HUP in fs/pipe.c
Comment 11 Jason Vas Dias 2022-09-07 20:19:03 UTC
The last version of the patch now should fix the problem,
since it also raises POLL_HUP in pipe_release(),
if !writers and fasync_readers, or 
   !readers and fasync_writers .
Comment 12 Jason Vas Dias 2022-09-08 01:41:34 UTC
Wow! Bullseye!

After doing a full yum update, and updating my Laptop's BIOS & 
Dock with gnome-software manufacturer software upgrade,
I then installed my patched (with latest #301769 version above),
and :

$ uname -a
Linux jvdspc.jvds.net 5.19.7-200.fc36.x86_64+debug #1 SMP PREEMPT_DYNAMIC Wed Sep 7 21:14:57 IST 2022 x86_64 x86_64 x86_64 GNU/Linux
[jvd@jvdspc:~ [3230] 02:30:42 [#:3!:6975]{0}	
$ ./t_sigio
t_sigio[3257]: 855318 Listening to 'cat' (stdin) on pipe fd: 3 with SIGIO....
t_sigio[3257]: 0 bytes available.
t_sigio[3257]: Alarmed: killing cat pid 3258.
t_sigio[3257]: 0 bytes available.
t_sigio[3257]: SUCCESS.

It is the fasync_kill in pipe_release() which does it, I am pretty sure.

But the whole Fedora system & GUI came up fine, hundreds of processes
depending in complex intimate ways on pipes are running fine , 
I will try running as many kernel test suites as I can find tomorrow,
but the patch appears to be 100% successful and causes no harm -
and it really does make I/O programming with SIGIO more robust & easy !
Comment 13 Jason Vas Dias 2022-09-08 01:43:00 UTC
In Comment #12, I should have written:
"I then installed my patched kernel ..."
Comment 14 Jason Vas Dias 2022-09-09 13:53:23 UTC
Please consider something like this patch for Linux 6.0 -
without it, it is impossible for a SIGIO receiving process
to reliably determine and be instantly notified when all
writers have closed the write end of a readable pipe, or
when all readers have closed the read end of a writable pipe,
without race conditions ; 
ie. prior to this patch, the only ways to determine if in fact (all of)
the pipe's peer(s) have in fact closed the pipe, was to do:

A) for readers not entering a blocking read() or poll(): 
   call ioctl(fd, FIONREAD, &sz) - if size is then 0, and we know
   for sure the most recent SIGIO event was POLL_IN, then we can
   surmise / guess that the remote writer has hung up.
   But this is prone to race conditions - how can we be positive
   this is the last event, particularly if we've blocked SIGIO
   before calling the ioctl ?
B) for writers:
   They'd HAVE to catch SIGPIPE to get notified, or enter poll() or
   blocking write().

Why FORCE pipe readers / writers to enter poll() or blocking read()/write() ?
   
Looking at the 't_sigio.c' attached testcase, all it has to do
is catch SIGIO, update some memory asynchronously if a SIGIO with
si_code==POLL_HUP arrives, and it knows for sure its writer has 
hung up.  There is not really a better way of detecting this situation ...

And if you look at all extant documentation on POLL_HUP, this is MEANT 
to be sent for ALL O_ASYNC F_SETOWN+F_SETSIG signal owning processes for
ALL types of stream that can go into O_ASYNC + signal notification mode, no ?

So all this patch does is honor the documented POLL_HUP semantics for pipe FDs 
too .

It works great! I have noticed NO differences in anything in my Fedora 36
distro, which includes many many instances of pipe using processes.

And pipe readers DO get instantly notified with this new POLL_HUP si_code
when a SIGIO is sent, when there are no writers on a pipe FD and the buffer
is empty (readers), or when there are no readers on a pipe FD (writers).
Comment 15 Jason Vas Dias 2022-09-09 14:42:32 UTC
Created attachment 301784 [details]
testcase: t_sigio.c

Improved to kill correct 'cat' pid -
when run by root on a busy system, the popen() pid may NOT be pid+1.
This has nothing to do with pipe FD issue being tested -
it could fail because it killed the wrong pid.
Comment 16 Jason Vas Dias 2022-10-04 12:24:31 UTC
Created attachment 302935 [details]
Patch against RHEL8 (Rocky Linux) v4.18.0-372-26.1 / Stable Tree

This patch of v4.18.0 :
  fs/pipe.c , 
builds & is tested against latest Rocky Linux RHEL8 build :
  http://mirrors.vinters.com/rocky/8.6/BaseOS/source/tree/Packages/k/kernel-4.18.0-372.26.1.el8_6.src.rpm
on an x86_64 OK - everything works as expected, AND
the test program passes , unlike WITHOUT the patch applied .
Comment 17 Jason Vas Dias 2022-10-04 13:16:49 UTC
Here is a link to the SRPM for v4.18 based kernels I built & am running OK :
  https://drive.google.com/file/d/1i-kAfDegaLv2NXgh_21iRUVDmZfZuNaR/view?usp=sharing
on a Rocky RHEL8 clone remote fileserver ,
and here is a link to the SRPM for v5.19.12+ based kernels I am running,
with first version of fs/pipe.c patch applied, on my Fedora 36 
x86_64 12-core laptop :
  https://drive.google.com/file/d/1adj1LbooL2NCIEYz3z0mrmaDHeWhvOjR/view?usp=sharing
.
The only difference to the spec files are 
 A) not enabling kernel signing, and 
 B) applying my patch to fs/pipe.c .
They all work fine - no problems observed in server or laptop,
and the point is, how else can programs distinguish between
a closed final write+read with then what would be a read() return value of 0,
which is NOT perceived by my test program, since it does not enter
 read() or poll(),
when ioctl(fd, FIONREAD, &len) returns with len==0 , and a write with 
a still active receiver which can then go on to handle more input, without
possibility of hanging infinitely ?
There is no easy / robust / reliable answer before my patch, IMHO -
entering poll() or read() is what I am able to avoid with this patch,
which reliably MAKES NO DIFFERENCE to ANY existing O_ASYNC pipe reader / 
writer process on a fully up-to-date Rocky Linux distro server system and on a 
fully up-to-date Fedora desktop system , and DOES allow the end-of-input with
length==0 case to be easily distinguished from the waiting-for-more input case .
Comment 18 Jason Vas Dias 2022-11-01 15:02:22 UTC
Both the above patches have been in test on 2 systems for 2 months now,
one laptop x86_64 12-core running Fedora 36 (v5.19.12+, v6.0.6+),
 with patch #1 applied:
  $ sha256sum fs_pipe_c_pipe_fd_sigio_poll_hup.patch 
  97bff61cb6f1f1d97e38d834ec76ea554aa8376c730ac2f76c038dbea029b0e6    \
  fs_pipe_c_pipe_fd_sigio_poll_hup.patch
, and one an old 4/8 core x86_64 Xeon E5606 running Rocky Linux EL8 (v4.18.372+), 
with patch #2 applied:
  $ sha256sum kernel-4.18.0-372-kbz216458_fs_pipe-26.1.el8_6.patch
  69c9d55ea54921c83fc62493a83511e5ddd34163fad50bb35a8bd0d80f87ec3b    \
  kernel-4.18.0-372-kbz216458_fs_pipe-26.1.el8_6.patch
both of which are posted above and have not changed.

Yes, patch #1 applies fine to v6.0.6-200.fc36+ (latest stable release) which
I am running now on the laptop, and have posted a Fedora 36 RPM SPEC file for at:
  https://bugzilla.redhat.com/show_bug.cgi?id=2125104

and patch #2 applies fine to v4.18.0-372.32.1.el8_6 (latest stable release),
which I am running on the remote NFS/DB Cloud Server.

And the patches make programming with pipe FDs MUCH simpler and more robust.

So, please do consider applying these patches to a future Linux release - thanks!
Comment 19 Linus Torvalds 2022-11-01 20:37:03 UTC
Honestly, while it might be better to send POLL_HUP in pipe_release(), this change just isn't _worth_ it. We don't know if any programs end up already depending on our existing behavior, and no new programs should ever depend on the new behavior since it would only mean that said programs then wouldn't work on old kernels.

So it's simply better to leave be, and maintain stable interfaces, when there are no urgent reasons not to.
Comment 20 Linus Torvalds 2022-11-01 22:50:32 UTC
Additional note: one issue is that the 'fasync()' interface is just not very widely used, and when used, the si_code thing was always badly defined.

It was badly designed exactly because it does *not* - despite the name - actually look like the POLL masks at all.  The actual poll() masks can give you multiple events, ie "you have data to be read, _and_ the other end has hung up", but 'si_code' and POLL_xyz is just a value.

End result: just don't use 'si_code'. Use SIGIO to get a signal, then use poll() or select to actually get the proper state of the file descriptor (or just do the IO).

You can go do the Debian code search, and it's fairly instructive: *nobody* actually seems to use si_code and POLL_IN and friends.  It's just not worth the pain, and it's not portable enough, and it's basically just all pain for no gain.

Linux already changed si_code once long ago, and it wasn't worth it. Changing it again is just even less worth it.

If you can point to a real program that (a) uses it and (b) would be noticeably improved by extra effort in the kernel to do something that we've never done and nobody has ever actually cared about, then that would be an interesting data point. But realistically, I just don't see it.
Comment 21 Jason Vas Dias 2022-11-02 17:13:47 UTC
So my SIGIO async IO handling library, for instance, must:
  If ever we have (just got) a SIGIO(si_code==POLL_IN) signal
  and ioctl(fd, FIONREAD, &nrdy) returns with errno==0 
  and 'nrdy'==0 , and my patch is NOT applied to the kernel,
  then immediately set the O_NONBLOCK FD flag, which is
  NOT normally required to be set, and then call read()
  to see if it returns 0 -
   note, we KNOW there are 0 bytes in the buffer, yet must
   call read() with len being at least 1 byte - 
   this breaks the promise of POLL_IN semantics
   that SIGIO(POLL_IN) should only be raised
   when there are bytes actually available to
   read in the buffer.
  Then, if O_NONBLOCK was NOT set on entry, it must be
  unset again.
  Then, if read() has returned 0, then it knows the remote
  writer has hung up; but if not, and we've read 1 byte,
  then it has to include that byte in the next read,
  for which it must call ioctl(fd, FIONREAD, &nrdy) again.

 Note it CANNOT then enter poll() or select() in the above case
 because the last SIGIO event it may ever get for that FD may already
 have occurred, so it could hang forever if it did.

 So 2 extra fcntl() calls and 1 extra ioctl() call and 1 extra read()
 call are required every time ioctl(fd, FIONREAD, &nrdy) returns 
 with ((!errno)&&!nrdy) .
 Contrast that with what it can do if it KNOWS SIGIO(si_code==POLL_HUP)
 will be raised when the remote end hangs up, 
 best illustrated in the 't_sigio.c' loop:


 bool prd=false;
 U32_t
  ac = 0;
 do_it_again:
  alarm(1);
  do
  { prd = prsigio(&ts);
    U32_t n_rd;
    if ( ioctl(fd, FIONREAD, &n_rd) != 0 )
      _ERR_(return 1, "ioctl FIONREAD failed");
    _SAY_("%u bytes available", n_rd);
    if ( n_rd > 0 )
    { n_rd = read(fd, buf, _MIN_(n_rd,sizeof(buf)));
      if ( n_rd > 0 )
        _SAY_("read %u bytes", n_rd);
      else
        _ERR_(break, "read <=0 bytes");
    }else
    { prd=false;
      unblock_sigio(&ssbl);
      pause();
      block_sigio(&ssbl);
    }
  } while ( (!alarmed) && (sigio_si_code != POLL_HUP) );
  

Since it knows SIGIO(si_code==POLL_IN) will ONLY be raised when bytes are
actually available to be read, and it knows SIGIO(si_code==POLL_HUP) will
be raised when the remote end has called pipe_release() (hung up), 
then the loop can be very simple, very robust, and avoid making
4 extra system calls and needing a much more complex loop structure 
to handle all the edge cases. The above loop will NEVER fail to
to receive sent bytes , and will NEVER hang waiting for a SIGIO
event that never will occur, and DOES handle the case where
both ioctl(fd,FIONREAD,&nrdy) returns with nrdy>0 AND si_code==POLL_HUP.

It means pipe FDs can be used with just as much precision as to when
the remote end has closed its connection as can socket FDs, and with
the SAME SIGIO using code.
  
So now Linux enforces on all users of SIGIO based O_ASYNC but not O_NONBLOCK
FD I/O very complicated edge case handling and extra system call load , so that 
indeed, NOW, currently,  
  "*nobody* actually seems to use si_code and POLL_IN and friends.  
   It's just not worth the pain, and it's not portable enough, 
   and it's basically just all pain for no gain.
  "
But that is because it is made unreliable by this missing POLL_HUP
peer pipe_release() indication ! 

If Linux had reliable indication of the remote end of a pipe FD 
hanging up , without having to call read(), or having to have initially
entered poll() or select(), for which one must create a contiguous array of
ALL poll()ed FDs (not feasible for all apps) - (plus select() cannot handle
more than 1024 such FDs, so then even more complexity is imposed on programs) --
I/O with pipes would be ALOT more reliable, without
possibility of hanging for ever, much easier to code for, and
SIGIO(si_code==X) would be reliable and robust, efficient and 
easy to use, so then in a few years you would see many more
programs & libraries like my SIGIO / async-IO library that can
rely on robust SIGIO based pipe FD I/O , which they currently cannot
without incurring heavy performance penalties and code structural complexity.
  
And I've been running these patches on a busy EL8 file & DB server, and on
a busy fully loaded linux laptop, with thousands of heavy pipe using processes
showing no problems when running under linux with either of the patches -
if they do not know about this change, they should be (and are, AFAICS)
entirely unaffected by it, it can only mean they get an extra event
when read() or write() would return 0 bytes read/written because the 
remote end has hung up; so this patch would only make processes close 
disconnected pipe FDs faster - that is the only effect it COULD have,
besides providing the great advance in pipe FD usage ease & reliability.

So, please reconsider, at least try the patch out - you will see it causes NO
issues with existing code - and consider all the extra complexity and system 
call load that users must implement in order to use async pipe FDs reliably 
and 100% free from hang-forever situations without it.

Note You need to log in before you can comment on or make changes to this bug.