Bug 61061

Summary: NVMe driver fails to wait/poll for the "namespace not ready" case and "buffer IO errors" are seen from block layer
Product: Drivers Reporter: shiro.itou (shiro.itou)
Component: OtherAssignee: drivers_other
Status: NEW ---    
Severity: normal CC: alan, kbusch, matthew, shiro.itou
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.11 Subsystem:
Regression: No Bisected commit-id:

Description shiro.itou 2013-09-09 06:47:01 UTC
When a new NVMe device is inserted, or,
echo 0 > /sys/bus/pci/....slots/...power
exho 1 > /sys/bus/pci/....slots/...power

is done, the following errors are seen.

 nvme0n1:
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
Dev nvme0n1: unable to read RDB block 0
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
 unable to read partition table
Comment 1 Keith Busch 2014-03-11 19:56:18 UTC
I submitted this patch a few weeks ago:

http://merlin.infradead.org/pipermail/linux-nvme/2014-February/000701.html

This will re-queue all failed IO that is retryable (i.e. "Do Not Retry" bit is not set in completion status) for up to one minute.

I don't have the device from this BZ to test with, but I modified my drive's f/w to return "Namespace not Ready" for several seconds after the controller status becomes ready. All commands were failed and retried several thousand times until success status was returned; no "Buffer I/O error" messages were logged, and the partition table was successfully read.
Comment 2 Keith Busch 2014-05-09 19:52:23 UTC
I have it on good authority that the patch set I submitted fixes hot insertion errors on many different devices and platforms. Can the submitter confirm this using the latest upstream kernel and update the bug accordingly?