Bug 201257
Summary: | SCSI write error not seen by Linux AIO? | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | dchen |
Component: | AIO | Assignee: | Badari Pulavarty (pbadari) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | ||
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.9.129 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
test program used to do Linux AIO
kludge to scsi_debug to return error on write Test log for unexpected behavior on 4.9.129 Test log for expected behavior or 4.18.10 |
Description
dchen
2018-09-27 22:11:23 UTC
Created attachment 278805 [details]
kludge to scsi_debug to return error on write
Created attachment 278807 [details]
Test log for unexpected behavior on 4.9.129
Created attachment 278809 [details]
Test log for expected behavior or 4.18.10
It could be that I'm seeing this unexpected (to me) behavior because of some quirk with the scsi_debug fake SCSI device. However, I originally ran into this behavior when testing against my company's ClearSky Storage, using both iSCSI and Fibre Channel targets. Since I've seen it on these different targets, I think it's unlikely to be some quirk of scsi_debug. I should add that when I use dd to do a standard write() to the SCSI device, instead of using Linux AIO, then I get an I/O error as expected. Also when I use device mapper to create a flakey device (e.g. "dmsetup create dchen-test --table="0 `blockdev --getsz /dev/sdg` flakey /dev/sdg 0 9 1"), instead of using a SCSI device, then I get an I/O error as expected. This bug is fixed by the change below: commit 41e817bca3acd3980efe5dd7d28af0e6f4ab9247 Author: Maximilian Heyne <mheyne@amazon.de> Date: Fri Nov 30 08:35:14 2018 -0700 fs: fix lost error code in dio_complete commit e259221763a40403d5bb232209998e8c45804ab8 ("fs: simplify the generic_write_sync prototype") reworked callers of generic_write_sync(), and ended up dropping the error return for the directio path. Prior to that commit, in dio_complete(), an error would be bubbled up the stack, but after that commit, errors passed on to dio_complete were eaten up. This was reported on the list earlier, and a fix was proposed in https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/, but never followed up with. We recently hit this bug in our testing where fencing io errors, which were previously erroring out with EIO, were being returned as success operations after this commit. The fix proposed on the list earlier was a little short -- it would have still called generic_write_sync() in case `ret` already contained an error. This fix ensures generic_write_sync() is only called when there's no pending error in the write. Additionally, transferred is replaced with ret to bring this code in line with other callers. Fixes: e259221763a4 ("fs: simplify the generic_write_sync prototype") Reported-by: Ravi Nankani <rnankani@amazon.com> Signed-off-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Christoph Hellwig <hch@lst.de> CC: Torsten Mehlan <tomeh@amazon.de> CC: Uwe Dannowski <uwed@amazon.de> CC: Amit Shah <aams@amazon.de> CC: David Woodhouse <dwmw@amazon.co.uk> CC: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> |