The readahead(2) man page says that it blocks until the requested data has been read. The whole point of this call is that it does NOT block but initiates background reads of data that will be required soon, so that when it is read, hopefully it can be done without blocking.
(In reply to comment #0) > The readahead(2) man page says that it blocks until the requested data has > been > read. The whole point of this call is that it does NOT block but initiates > background reads of data that will be required soon, so that when it is read, > hopefully it can be done without blocking. How did you verify or test this assertion?
Common usage of the call ( see the ureadahead, sreadahead packages ), inspection of blktrace data while using these packages, discussion on linux-mm, inspection of the kernel sources. The call *can* block, for instance, if it must read ext2/3 indirect blocks to map the data blocks so their reads can be queued, but the whole point of the call is to *avoid* blocking.
So, I'm no expert here, but on my system (ext4), reading a 1 GB file (with large userspace buffers) takes about 12 seconds. Doing a readahead() of a 1GB file also takes about 12 seconds. If I'm understanding what you are saying correctly, the readahead() should return much faster than the file-read case. What am I missing?
Around two years back I dug into why readahead() was blocking when it wasn't supposed to and after discussion on linux-mm, it turned out to be due to ext3 using indirect blocks which had to be read first to learn the location of the data blocks before they could be queued. Switching to ext4 extents fixed that and readahead() stopped blocking. It seems there has been a regression recently so I started a thread on linux-mm to try and figure it out.
After closer inspection, it turned out to be the same issue as before. I was testing on an iso file I had downloaded with bittorrent, and looking at it with debugfs showed that while the data blocks were contiguous, the extent tree was fragmented, causing ext4 to have to block the readahead to read extent tree blocks, the last of which was fairly close to the end of the file. After creating a new file with dd from /dev/zero, and verifying it did not have the extent tree problem, readahead() does not block on the file. Specifically running readahead ; iotop -d 2 ( after drop_caches ) immediately completes the readahead, then iotop shows lots of read throughput for several seconds. So it seems that there are still some ext4 issues that cause readahead to block more than you would want to, but as I said before, the call is not supposed to block if possible.
I'd be interested to see a (simple) script or log file so I can follow your steps. I tried repeating the same measurements (read() an entire file, readahead() the entire file) for a variety of other file systems: XFS, JFS, Reiserfs. In each case, the time taken for the read loop to complete and the readahead() system call to complete were similar (10 to 15 seconds for a 1GB file).
dd if=/dev/zero of=foo bs=1MiB count=512 echo 1 > /proc/sys/vm_drop_caches time readahead foo && iostat -d 2 real 0m0.212s user 0m0.000s sys 0m0.164s Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 13.07 348.27 564.72 24916849 40401892 sdb 12.34 342.80 564.88 24525467 40413276 sdc 12.64 353.08 558.45 25260487 39953612 sdd 12.70 344.31 561.36 24633134 40162088 md0 128.52 690.00 1518.61 49365215 108647060 dm-0 4.09 68.50 14.09 4900585 1008368 dm-1 1.13 529.16 10.01 37857989 716364 dm-2 0.00 0.01 0.00 748 0 dm-3 104.29 16.60 1494.47 1187561 106920224 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 83.50 42244.00 0.50 84488 1 sdb 73.50 37122.00 0.50 74244 1 sdc 76.50 38658.00 0.50 77316 1 sdd 80.00 40450.00 0.50 80900 1 md0 1.00 2.00 2.00 4 4 dm-0 1.00 2.00 2.00 4 4 dm-1 0.00 0.00 0.00 0 0 dm-2 0.00 0.00 0.00 0 0 dm-3 0.00 0.00 0.00 0 0 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 33.00 16384.00 2.50 32768 5 sdb 40.00 20224.00 0.50 40448 1 sdc 37.00 18688.00 0.50 37376 1 sdd 32.00 15872.00 2.50 31744 5 md0 0.00 0.00 0.00 0 0 dm-0 0.00 0.00 0.00 0 0 dm-1 0.00 0.00 0.00 0 0 dm-2 0.00 0.00 0.00 0 0 dm-3 0.00 0.00 0.00 0 0 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 1.50 0.00 1.00 0 2 sdb 2.50 0.00 19.00 0 38 sdc 1.50 0.00 1.00 0 2 sdd 2.50 0.00 19.00 0 38 md0 5.00 0.00 18.00 0 36 dm-0 4.50 0.00 18.00 0 36 dm-1 0.00 0.00 0.00 0 0 dm-2 0.00 0.00 0.00 0 0 dm-3 0.00 0.00 0.00 0 0 When I was seeing the blocking, I observed the following stat output from a stat command on the file in a debugfs session: EXTENTS: (ETB0):33795, (0-30975):370688-401663, (30976-40191):401664-410879, (ETB0):483328, (40192-72575):410880-443263, (72576-81919):443264-452607, (ETB0):483329, (81920-111578):452608-482266 As you can see, the file has 3 level zero extent tree blocks. That last extent of blocks (452608-482266) can't be queued for reading until the ETB at 483329 has been read, and so the readahead blocks until all of the blocks before have been read, because they are ahead of the ETB in the queue, then once it has the ETB, it queues the last extent and returns. The freshly created file with dd has no extent tree blocks to read, so no blocking happens.
Bump.
Created attachment 129481 [details] Patch from Phillip Susi Patch submitted to linux-man by Phillip Susi on 2014-03-14
Applied Phillip's patch, with some tweaks.