Bug 80961

Summary: [PATCH]register_bcache return value does not distinguish between transient and permanent failures
Product: IO/Storage Reporter: Ian Pilcher (arequipeno)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: RESOLVED WILL_NOT_FIX    
Severity: normal CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.15.6 Subsystem:
Regression: No Bisected commit-id:
Attachments: Make register_bcache distinguish (possibly) non-fatal errors

Description Ian Pilcher 2014-07-23 03:36:17 UTC
Created attachment 143981 [details]
Make register_bcache distinguish (possibly) non-fatal errors

When registering a bcache device (cache or backing device), register_bcache attempts to acquire exclusive access to the device, and returns an error if it fails to do so.  In the error path, it checks whether the device is already registered as a bcache device, and it logs a different error message in this case ("device already registered" vs. "device busy"), but it does not return a different value.  It returns -EINVAL for all errors.

This means that userspace has no way of distinguishing between the "device busy" case, which should potentially be retried and the "already registered" case (or other errors), which are presumably fatal.

This is a real problem.  I have found that registration of a partition on an MD RAID (imsm) device fails about 50% of the time when triggered by udev.  Changing register_bcache's return value to distinguish between the two cases and modifying the udev helper to retry in the "device busy" case appears to fix the problem.

The attached patch changed register_bcache to return -EBUSY in the case where it cannot obtain exclusive access to the device, but it is not already registered with bcache.  It still returns -EINVAL for all other errors.
Comment 1 Alan 2014-07-29 15:21:50 UTC
Please see Documentation/SubmittingPatches

We can't take patches via bugzilla as we need a Signed-off-by:
Comment 2 Ian Pilcher 2014-07-30 00:25:06 UTC
(In reply to Alan from comment #1)
> We can't take patches via bugzilla as we need a Signed-off-by:

It never occurred to me that the patch would go in as is.  I don't even know if it applies against the latest development tree (and I've long since given up playing kernel patch Whack-a-Mole).

This is a bug report, that happens to include a suggested approach to fixing the issue.

Since all roads seem to lead to bcache-devel, and no one there appears interested ...