How to reproduce: 1. # echo "0 $((512*10000)) error" | dmsetup create ioerr 2. #define _GNU_SOURCE #include <sys/socket.h> #include <sys/un.h> #include <fcntl.h> #include <unistd.h> #define CNT (4096) int main () { int errfd=open("/dev/mapper/ioerr",O_RDONLY|O_NONBLOCK); int goodfd=open("/dev/urandom",O_RDONLY|O_NONBLOCK); int p[2]; char buf[CNT*(3+1)]; pipe(p); splice(goodfd,0,p[1],0,CNT,SPLICE_F_MOVE|SPLICE_F_NONBLOCK); /* marked line. see below */ splice(errfd,0,p[1],0,CNT,SPLICE_F_MOVE|SPLICE_F_NONBLOCK); splice(goodfd,0,p[1],0,CNT,SPLICE_F_MOVE|SPLICE_F_NONBLOCK); close(errfd); close(goodfd); read(p[0],buf,sizeof(buf)); read(p[0],buf,sizeof(buf)); read(p[0],buf,sizeof(buf)); read(p[0],buf,sizeof(buf)); return 0; } 3. strace: open("/dev/mapper/ioerr", O_RDONLY|O_NONBLOCK) = 3 open("/dev/urandom", O_RDONLY|O_NONBLOCK) = 4 pipe([5, 6]) = 0 splice(0x4, 0, 0x6, 0, 0x1000, 0x3) = 4096 splice(0x3, 0, 0x6, 0, 0x1000, 0x3) = 4096 splice(0x4, 0, 0x6, 0, 0x1000, 0x3) = 4096 close(3) = 0 close(4) = 0 read(5, "\222\272\376\22N\310\371_kM!G\3212\336\306g\25\341\r\361\321\273\242 `\26f\317G\235\314"..., 16384) = 4096 read(5, "", 16384) = 0 read(5, "", 16384) = 0 read(5, "", 16384) = 0 exit_group(0) = ? 4. ACHTUNG. read() should return at least 2*4096 bytes (two pages from urandom) !!!!! 5. got to marked line. replace errfd with goodfd, and voila! read returns 3 pages. The bug: There is no way to detect io erros. in mmap()'ed environment we will receive signal, in splice()'ed io we will get o_O.
In spite of SPLICE_F_MOVE, kernel read pages from ioerr device. I saw messages in dmesg output even when read() calls was commented. It is OK, I understand, that this flag is not implemented now. 1. But i don't understand why second splice() call does not return error in that case. 2. If SPLICE_F_MOVE will be implemented, read from pipe sohuld return io error. does not it ? it seems that this never happen by design :( P.S. without any flags (O_NONBLOCK,SPLICE_F_NONBLOCK,SPLICE_F_MOVE) bahviour is exactly the same.
Trying to splice this bad page from pipe to file will give splice() returning 0. this is not correct difinitely.
http://lwn.net/Articles/119680/ : --------8<-------------- - error handling is broken - if an IO error happens on the splice, the "map()" operation will return NULL, and the current fs/pipe.c can't handle that and will oops ;) --------8<-------------- It seems, that root of the problem is in far 2005 :)
fs/splice.c (near the line 382): --------------- 8<--------- error = ops->confirm(pipe, buf); if (error) { if (!ret) error = ret; break; } --------------- 8<--------- maybe ret=error instead of error=ret ? (as in line 403)
The patch has been committed to stable tree. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e5953cbdff26f7cbae7eff30cd9b18c4e19b7594