Bug 61061 - NVMe driver fails to wait/poll for the "namespace not ready" case and "buffer IO errors" are seen from block layer
Summary: NVMe driver fails to wait/poll for the "namespace not ready" case and "buffer...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-09 06:47 UTC by shiro.itou
Modified: 2014-05-09 19:52 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.11
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description shiro.itou 2013-09-09 06:47:01 UTC
When a new NVMe device is inserted, or,
echo 0 > /sys/bus/pci/....slots/...power
exho 1 > /sys/bus/pci/....slots/...power

is done, the following errors are seen.

 nvme0n1:
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
Dev nvme0n1: unable to read RDB block 0
Buffer I/O error on device nvme0n1, logical block 0
Buffer I/O error on device nvme0n1, logical block 0
 unable to read partition table
Comment 1 Keith Busch 2014-03-11 19:56:18 UTC
I submitted this patch a few weeks ago:

http://merlin.infradead.org/pipermail/linux-nvme/2014-February/000701.html

This will re-queue all failed IO that is retryable (i.e. "Do Not Retry" bit is not set in completion status) for up to one minute.

I don't have the device from this BZ to test with, but I modified my drive's f/w to return "Namespace not Ready" for several seconds after the controller status becomes ready. All commands were failed and retried several thousand times until success status was returned; no "Buffer I/O error" messages were logged, and the partition table was successfully read.
Comment 2 Keith Busch 2014-05-09 19:52:23 UTC
I have it on good authority that the patch set I submitted fixes hot insertion errors on many different devices and platforms. Can the submitter confirm this using the latest upstream kernel and update the bug accordingly?

Note You need to log in before you can comment on or make changes to this bug.