Bug 201641 - reading from PTS returns -EAGAIN
Summary: reading from PTS returns -EAGAIN
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-09 00:37 UTC by Matteo Croce
Modified: 2019-09-30 11:42 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.1.0-rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Matteo Croce 2018-11-09 00:37:26 UTC
Sometimes a pts can screw so bad, that any program reading from it returns -EAGAIN:

root@apu:~# cat
cat: -: Resource temporarily unavailable
root@apu:~# strace -e read cat
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000,\2\0\0\0\0\0"..., 832) = 832
read(0, 0x7f5093da4000, 131072)         = -1 EAGAIN (Resource temporarily unavailable)
cat: -read(3, "# Locale name alias data base.\n#"..., 4096) = 2995
read(3, "", 4096)                       = 0
: Resource temporarily unavailable
+++ exited with 1 +++


I experienced this issue on both Fedora and Debian, the workaround is to close the pts (eg. exit the SSH session) and open a new shell/session.
Comment 1 Matteo Croce 2019-03-22 01:53:18 UTC
I experience the same issue on a different machine with a vanilla 5.0 kernel:

root@saturno:~# strace -e read cat
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\4\2\0\0\0\0\0"..., 832) = 832
read(0, 0x7f2a70511000, 131072)         = -1 EAGAIN (Resource temporarily unavailable)
cat: -read(3, "# Locale name alias data base.\n#"..., 4096) = 2995
read(3, "", 4096)                       = 0
: Resource temporarily unavailable
+++ exited with 1 +++
root@saturno:~# stty 
speed 38400 baud; line = 0;
-brkint -imaxbel iutf8
root@saturno:~# ll /proc/self/fd/
total 0
lrwx------ 1 root root 64 mar 22 02:51 0 -> /dev/pts/5
lrwx------ 1 root root 64 mar 22 02:51 1 -> /dev/pts/5
lrwx------ 1 root root 64 mar 22 02:51 2 -> /dev/pts/5
lr-x------ 1 root root 64 mar 22 02:51 3 -> /proc/2919/fd


unfortunately I don't have perf available on this machine
Comment 2 Matteo Croce 2019-03-22 02:00:54 UTC
I don't have kprobes copiled in, but I managed to run the bug in a tight loop by running this in the affected pts:

# while : ; do read a; done 2>/dev/null

and started 'perf top -U' in another terminal, I get this:

Overhead  Shared Object     Symbol                                                                                                                                     
  22,56%  [kernel]          [k] nmi
  18,95%  [kernel]          [k] syscall_return_via_sysret
  12,02%  [kernel]          [k] n_tty_read
  11,80%  [kernel]          [k] __entry_text_start
   9,43%  [kernel]          [k] entry_SYSCALL_64_after_hwframe
   6,62%  [kernel]          [k] tty_mode_ioctl
   3,39%  [kernel]          [k] _raw_spin_lock_irqsave
   2,51%  [kernel]          [k] do_vfs_ioctl
   1,23%  [kernel]          [k] do_syscall_64
   1,04%  [kernel]          [k] __fget_light
   0,89%  [kernel]          [k] __fdget_pos
   0,87%  [kernel]          [k] tty_ioctl
   0,82%  [kernel]          [k] n_tty_ioctl
   0,66%  [kernel]          [k] ldsem_down_read
   0,65%  [kernel]          [k] up_read
   0,61%  [kernel]          [k] vfs_write
   0,60%  [kernel]          [k] tty_read
   0,44%  [kernel]          [k] fsnotify
   0,44%  [kernel]          [k] tty_hung_up_p

There are only a few -EAGAIN in n_tty_read, so we could start investigating this one
Comment 3 Matteo Croce 2019-03-22 09:36:05 UTC
root@saturno:~# stty -a
speed 38400 baud; rows 37; columns 167; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W;
lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -ixon -ixoff -iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho -extproc

root@saturno:~# cat
cat: -: Resource temporarily unavailable
Comment 4 Matteo Croce 2019-03-22 09:50:16 UTC
Running 'cat /dev/pts/5' from another terminal fixed it.

This seems similar to this one:

https://git.kernel.org/tglx/history/c/03b1fb24c60a
Comment 5 Matteo Croce 2019-03-29 17:17:48 UTC
Hi all,

I managed to reproduce the issue again, this time on an arm64 machine running latest 5.1.0-rc2 kernel.

I wrote a test program which prints the flags of a file descriptor:

    #include <stdio.h>
    #include <unistd.h>
    #include <fcntl.h>

    int main(int argc, char *argv[])
    {
            int fd = 0;

            if (argc > 1)
                    fd = open(argv[1], O_RDONLY);

            printf("flags: %x\n", fcntl(fd, F_GETFL));

            return 0;
    }

and I ran it on the tty which is experiencing the issue:

    # strace -e fcntl ./nonblock
    fcntl(0, F_GETFL) = 0xc02 (flags O_RDWR|O_APPEND|O_NONBLOCK)
    flags: c02

So the problem is that, somehow, the pts gets the O_NONBLOCK flag.
At this point I don't know if it's a kernel bug, or some userspace process which sets the non blocking mode, so I wrote another tool which sets the O_NONBLOCK flag to reproduce the issue, but this didn't work, probably because BASH clears it after that the child exits.
Comment 6 Matteo Croce 2019-09-30 11:42:15 UTC
Hi all,

`exec bash` resolves the issue, so I think it's a bash issue, which leaves the pts with O_NONBLOCK set.

Closing the bug.

Note You need to log in before you can comment on or make changes to this bug.