When a new NVMe device is inserted, or, echo 0 > /sys/bus/pci/....slots/...power exho 1 > /sys/bus/pci/....slots/...power is done, the following errors are seen. nvme0n1: Buffer I/O error on device nvme0n1, logical block 0 Buffer I/O error on device nvme0n1, logical block 0 Buffer I/O error on device nvme0n1, logical block 0 Buffer I/O error on device nvme0n1, logical block 0 Buffer I/O error on device nvme0n1, logical block 0 Buffer I/O error on device nvme0n1, logical block 0 Buffer I/O error on device nvme0n1, logical block 0 Dev nvme0n1: unable to read RDB block 0 Buffer I/O error on device nvme0n1, logical block 0 Buffer I/O error on device nvme0n1, logical block 0 unable to read partition table
I submitted this patch a few weeks ago: http://merlin.infradead.org/pipermail/linux-nvme/2014-February/000701.html This will re-queue all failed IO that is retryable (i.e. "Do Not Retry" bit is not set in completion status) for up to one minute. I don't have the device from this BZ to test with, but I modified my drive's f/w to return "Namespace not Ready" for several seconds after the controller status becomes ready. All commands were failed and retried several thousand times until success status was returned; no "Buffer I/O error" messages were logged, and the partition table was successfully read.
I have it on good authority that the patch set I submitted fixes hot insertion errors on many different devices and platforms. Can the submitter confirm this using the latest upstream kernel and update the bug accordingly?