Bug 15272

Summary: epoll_ctl(2) fails on regular plain files
Product: IO/Storage Reporter: Stephane Thiell (stephane.thiell)
Component: OtherAssignee: Davide Libenzi (davidel)
Status: RESOLVED WILL_NOT_FIX    
Severity: normal CC: akpm, alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31.6-145.fc11 Subsystem:
Regression: No Bisected commit-id:

Description Stephane Thiell 2010-02-11 14:00:27 UTC
epoll_ctl(2) doesn't work for regular file or stdin (fd 0) and return EPERM. On FreeBSD, for example, it seems that kqueue(2) is able to listen for plain file readiness, why epoll doesn't allow this? Please note that poll(2) on the same kernel works just fine for plain files. A simple reproducer using fd 0 (stdin) is shown below. The same appends for plain files opened with open(2).

It's annoying when an application wants to also listen for fd 0 in a common fashion (pipe or file).


>> reproducer.c:

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/epoll.h>

int
main(int argc, char **argv)
{
  struct epoll_event event;
  int epfd, rc;

  event.events = EPOLLIN;

  epfd = epoll_create(1023);
  if (epfd < 0) {
    printf("epoll_create failed %d\n", epfd);
    printf("errno %d %s\n", errno, strerror(errno));
    return 1;
  }
  errno = 0;
  rc = epoll_ctl(epfd, EPOLL_CTL_ADD, 0, &event);
  printf("epoll_ctl returned %d\n", rc);
  printf("errno %d %s\n", errno, strerror(errno));
  return -rc;
}

$ echo foobar | ./reproducer 
epoll_ctl returned 0
errno 0 Success

$ echo foobar >/tmp/foobar

$ ./reproducer < /tmp/foobar
epoll_ctl returned -1
errno 1 Operation not permitted
Comment 1 Davide Libenzi 2010-02-12 19:46:05 UTC
Regular files do not support the linux ->poll() file operation.
If you want to wait for events on regular files, you need to use AIO+eventfd+epoll.
Comment 2 Davide Libenzi 2010-02-12 19:50:16 UTC
BTW, this is not an epoll(4) problem. not even poll(2) or select(2) will work with that example.
Comment 3 Stephane Thiell 2010-02-13 19:53:35 UTC
Davide,
Thank you for your reply. I can understand comment #1, but for comment #2, it looks like poll(2) works for available read on regular file (simple case, file not being modified), for example:

>> reproducer-poll.c:

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <poll.h>


int
main(int argc, char **argv)
{
  struct pollfd onepollfd = { 0, POLLIN };
  int nfds;

  errno = 0;
  nfds = poll(&onepollfd, 1, -1);
  printf("nfds=%d\n", nfds);
  if (nfds < 0) {
    printf("poll failed %d\n", nfds);
    printf("errno %d %s\n", errno, strerror(errno));
    return 1;
  } else if (nfds > 0) {
    printf("revents 0x%x (POLLIN=0x%x, POLLERR=0x%x)\n",
        onepollfd.revents, POLLIN, POLLERR);
  }
  return 0;
}


$ ./reproducer-poll < /tmp/foobar 
nfds=1
revents 0x1 (POLLIN=0x1, POLLERR=0x8)


But, when writing this post, I can see that the read event is always available even when there is no data to read in the file (/tmp/foobar). However, having such behaviour is useful for simple user applications that have an unique handler for socket, pipe or regular file descriptors. Would it be possible to make epoll(4) behave like poll(2) for regular files? Or maybe poll(2) should return an error for regular files, like POLLERR?
Comment 4 Davide Libenzi 2010-02-13 21:53:00 UTC
It seems it is working, but it isn't ;)
If you look at fs/select.c, line 723 to 731, you notice that in case f_op->poll is not provided by the device, DEFAULT_POLLMASK is used as returned mask, where DEFAULT_POLLMASK is defined as (POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM).
Later on, this DEFAULT_POLLMASK is masked with your mask, which returns POLLIN, even though no test have been really performed with the device, since a file device does not provide an f_op->poll() function.
Epoll will fail you explicitly, while poll/select will not. but nothing meaningful is returned from poll/select on file system files.
Maybe poll/select should be changed to return POLLERR or POLLNVAL in case the device is not supported, but dunno if this is "POSIX Legal".