Sometimes a pts can screw so bad, that any program reading from it returns -EAGAIN: root@apu:~# cat cat: -: Resource temporarily unavailable root@apu:~# strace -e read cat read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000,\2\0\0\0\0\0"..., 832) = 832 read(0, 0x7f5093da4000, 131072) = -1 EAGAIN (Resource temporarily unavailable) cat: -read(3, "# Locale name alias data base.\n#"..., 4096) = 2995 read(3, "", 4096) = 0 : Resource temporarily unavailable +++ exited with 1 +++ I experienced this issue on both Fedora and Debian, the workaround is to close the pts (eg. exit the SSH session) and open a new shell/session.
I experience the same issue on a different machine with a vanilla 5.0 kernel: root@saturno:~# strace -e read cat read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\4\2\0\0\0\0\0"..., 832) = 832 read(0, 0x7f2a70511000, 131072) = -1 EAGAIN (Resource temporarily unavailable) cat: -read(3, "# Locale name alias data base.\n#"..., 4096) = 2995 read(3, "", 4096) = 0 : Resource temporarily unavailable +++ exited with 1 +++ root@saturno:~# stty speed 38400 baud; line = 0; -brkint -imaxbel iutf8 root@saturno:~# ll /proc/self/fd/ total 0 lrwx------ 1 root root 64 mar 22 02:51 0 -> /dev/pts/5 lrwx------ 1 root root 64 mar 22 02:51 1 -> /dev/pts/5 lrwx------ 1 root root 64 mar 22 02:51 2 -> /dev/pts/5 lr-x------ 1 root root 64 mar 22 02:51 3 -> /proc/2919/fd unfortunately I don't have perf available on this machine
I don't have kprobes copiled in, but I managed to run the bug in a tight loop by running this in the affected pts: # while : ; do read a; done 2>/dev/null and started 'perf top -U' in another terminal, I get this: Overhead Shared Object Symbol 22,56% [kernel] [k] nmi 18,95% [kernel] [k] syscall_return_via_sysret 12,02% [kernel] [k] n_tty_read 11,80% [kernel] [k] __entry_text_start 9,43% [kernel] [k] entry_SYSCALL_64_after_hwframe 6,62% [kernel] [k] tty_mode_ioctl 3,39% [kernel] [k] _raw_spin_lock_irqsave 2,51% [kernel] [k] do_vfs_ioctl 1,23% [kernel] [k] do_syscall_64 1,04% [kernel] [k] __fget_light 0,89% [kernel] [k] __fdget_pos 0,87% [kernel] [k] tty_ioctl 0,82% [kernel] [k] n_tty_ioctl 0,66% [kernel] [k] ldsem_down_read 0,65% [kernel] [k] up_read 0,61% [kernel] [k] vfs_write 0,60% [kernel] [k] tty_read 0,44% [kernel] [k] fsnotify 0,44% [kernel] [k] tty_hung_up_p There are only a few -EAGAIN in n_tty_read, so we could start investigating this one
root@saturno:~# stty -a speed 38400 baud; rows 37; columns 167; line = 0; intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0; -parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts -ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -ixon -ixoff -iuclc -ixany -imaxbel iutf8 opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0 isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho -extproc root@saturno:~# cat cat: -: Resource temporarily unavailable
Running 'cat /dev/pts/5' from another terminal fixed it. This seems similar to this one: https://git.kernel.org/tglx/history/c/03b1fb24c60a
Hi all, I managed to reproduce the issue again, this time on an arm64 machine running latest 5.1.0-rc2 kernel. I wrote a test program which prints the flags of a file descriptor: #include <stdio.h> #include <unistd.h> #include <fcntl.h> int main(int argc, char *argv[]) { int fd = 0; if (argc > 1) fd = open(argv[1], O_RDONLY); printf("flags: %x\n", fcntl(fd, F_GETFL)); return 0; } and I ran it on the tty which is experiencing the issue: # strace -e fcntl ./nonblock fcntl(0, F_GETFL) = 0xc02 (flags O_RDWR|O_APPEND|O_NONBLOCK) flags: c02 So the problem is that, somehow, the pts gets the O_NONBLOCK flag. At this point I don't know if it's a kernel bug, or some userspace process which sets the non blocking mode, so I wrote another tool which sets the O_NONBLOCK flag to reproduce the issue, but this didn't work, probably because BASH clears it after that the child exits.
Hi all, `exec bash` resolves the issue, so I think it's a bash issue, which leaves the pts with O_NONBLOCK set. Closing the bug.