Most recent kernel where this bug did not occur:
Distribution: gentoo RHAS4.3 FC5
Hardware Environment: various
Software Environment: libaio
after io_getevents reports that write/appen was done, the data in file is still
unaccessible (neither reported by fstat() nor readable via io_submit())
Steps to reproduce:
gcc aiotest-wait.c -laio -lpthread
run attached test on SMP few times
for UP edit source and initialize thread_write with fun_write1()
NOTE: if synch is introduced (not fstating until write io_submit() returns) all
is fine. For test that compile with
gcc aiotest-wait.c -DDO_SYNCH -laio -lpthread
Created attachment 8547 [details]
testcase showing the race
> after io_getevents reports that write/appen was done, the data in file is
> unaccessible (neither reported by fstat() nor readable via io_submit())
Yeah, this looks like a real bug. The path under generic_file_direct_IO() is
calling aio_complete() before its caller updates i_size. I think this test is
seeing it because it has another thread waiting in io_getevents() while another
thread is submitting the extending write. Usually the completion isn't seen
until io_submit() returns after having updated i_size.
I won't have time to craft a patch today, I don't think, but I'll try during OLS
next week if no one beats me to it.
I reproduced the failure under 2.6.18-rc1-mm2 on a dual athlon. (b.k.o yelled
at me when I tried to add it to the kernel version field.)
Zach, I'm not sure I understand You, as You only watch this thread
> I won't have time to craft a patch today,
You mean You going to create the patch?
> You mean You going to create the patch?
Yeah, I sent it off to the kernel mailing list but haven't gotten any response.
We need to make sure that this fix doesn't break other uses of O_DIRECT and aio
before we can merge it into the kernel.
Zach, I'm guessing that this patch is obsoleted by the patch series you sent out
on September 5th (which cleans up the error handling in the dio code). Is that
Yeah, the 'dio: clean up completion phase' patch serious addresses this problem.
It's the final patch in the series that does it:
The hunks that remove the EIOCBQUEUED translation from dio's callers are the
We're still working on testing the patch series. We had an unrelated hardware
failure that is slowing things down :/. It's been solid enough so far that I
might throw it in -mm once .19 opens.
Zach - recently (for few days) I'm struggling with strange AIO behaviour.
I have HDD with some badblocks (or rather more than some;). In general case AIO
reads seem to operate OK(return EIO), but very rarely read returns unmodified
read-buffer and no error. The kernel is 2.6.9-34.ELsmp. I suspect aio bug - but
no clear evidence. Do You happen to know anything of such possible bug?
> but very rarely read returns unmodified
> read-buffer and no error. The kernel is 2.6.9-34.ELsmp. I suspect aio bug - but
> no clear evidence. Do You happen to know anything of such possible bug?
Hmm, I could imagine cases where it might lose an error that was generated from
block lookups but I don't know of any bugs which would specifically do this.
Is it possible that file system corruption has caused the file size to shrink
and for reads to be issued past the end of the file? That would lead to reads
that succeed with 0 bytes read.
Are you in a position where you can try the patches to the kernel that were
referred to in comment 7?
Can you try the
> Is it possible that file system corruption has caused the file size to shrink
> and for reads to be issued past the end of the file? That would lead to reads
> that succeed with 0 bytes read.
I'm reading raw device in my test. Read with 0 bytes and unmodified buffer is OK
for me, hence I'm getting 4K data read and 4K unmodified buffer and no error.
> Are you in a position where you can try the patches to the kernel that were
> referred to in comment 7?
I may try to try ;) That's because I'm not allowed to use kernel other than
official RH and obviously can't run the test on another machine ;)
It's going to last a little.
The first race that this bug was filed for was fixed mainline in commit 8459d86aff04fa53c2ab6a6b9f355b3063cc8014 about a year ago.
We have a test which makes sure this bug doesn't regress:
So this bug can be closed. Any further bugs with distribution kernels should be taken up with the distribution provider.