Bug 11843
Description
Rafael J. Wysocki
2008-10-25 12:16:42 UTC
This bug is still present. There are a couple o patches I have to test while running usbmon, I'll post the trace here. The dmesg and usbmon output follows, and here's the explanation for patches A, B and C: patch A: report underflow with no sense data (commit 8bfa24727087d7252f9ecfb5fea2dfc92d797fbd) patch B: reduce req->retries and return EIO if reached 0 patch C: unusual_devs entry for this specific usb enclosure Summary of results: 2.6.26.5: OK 2.6.27.4: Infinite Sense errors 2.6.27.4 + A: Infinite Sense errors 2.6.27.4 + A + B: Two Sense errors, and I/O errors from sector 781422761 to 781422768 (reported as last sector) 2.6.27.4 + A + B + C: No Sense error, but I/O of sector 781422761 (reported last sector is: 781422767) 2.6.27.4 + A + C: OK 2.6.27.4 + C: OK Created attachment 18621 [details]
kernel messages for 2.6.26.5
Created attachment 18622 [details]
kernel messages for 2.6.27.4
Created attachment 18623 [details] kernel messages for 2.6.27.4 with commit 8bfa24727087d7252f9ecfb5fea2dfc92d797fbd The commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bfa24727087d7252f9ecfb5fea2dfc92d797fbd Created attachment 18624 [details]
kernel messages for 2.6.27.4 with A and B
The B patch:
Index: 2.6.27.4/drivers/scsi/scsi_lib.c
===================================================================
--- 2.6.27.4.orig/drivers/scsi/scsi_lib.c
+++ 2.6.27.4/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct
struct request *req = cmd->request;
unsigned long flags;
+ if (--req->retries <= 0) {
+ blk_end_request(req, -EIO, blk_rq_bytes(req));
+ scsi_next_command(cmd);
+ return;
+ }
scsi_unprep_request(req);
spin_lock_irqsave(q->queue_lock, flags);
blk_requeue_request(q, req);
Created attachment 18625 [details] kernel messages for 2.6.27.4 with A and B and C The C patch is: Index: usb-2.6/drivers/usb/storage/unusual_devs.h =================================================================== --- usb-2.6.orig/drivers/usb/storage/unusual_devs.h +++ usb-2.6/drivers/usb/storage/unusual_devs.h @@ -1251,6 +1251,13 @@ UNUSUAL_DEV( 0x0839, 0x000a, 0x0001, 0x0 US_SC_DEVICE, US_PR_DEVICE, NULL, US_FL_FIX_INQUIRY), +/* Reported by Luciano Rocha <luciano@eurotux.com> */ +UNUSUAL_DEV( 0x0840, 0x0082, 0x0001, 0x0001, + "Argosy", + "Storage", + US_SC_DEVICE, US_PR_DEVICE, NULL, + US_FL_FIX_CAPACITY), + /* Entry and supporting patch by Theodore Kilgore <kilgota@auburn.edu>. * Flag will support Bulk devices which use a standards-violating 32-byte * Command Block Wrapper. Here, the "DC2MEGA" cameras (several brands) with Created attachment 18626 [details]
kernel messages for 2.6.27.4 with A and C
Created attachment 18627 [details]
kernel messages for 2.6.27.4 with C
Created attachment 18628 [details]
usb traffic on 2.6.26.5
Created attachment 18629 [details]
usb traffic on 2.6.27.4
Created attachment 18630 [details]
usb traffic on 2.6.27.4
Created attachment 18631 [details]
usb traffic on 2.6.27.4 with A
Created attachment 18632 [details]
usb traffic on 2.6.27.4 with A and B
Created attachment 18633 [details]
usb traffic on 2.6.27.4 with A and B and C
Created attachment 18634 [details]
usb traffic on 2.6.27.4 with A and C
Created attachment 18635 [details]
usb traffic on 2.6.27.4 with C
At the very least this affects a large number of external USB controllers. Are they really *all* broken (not entirely implausible, but surprising), or do we have a different bug here? My old cheap external USB enclosure has the same issue. It's a "Prolific Technology, Inc. PL3507 ATAI6 Bridge", USB id 067b:3507. The drive itself, FWIW, is a Seagate ST325082. There's an open Ubuntu bug (https://bugs.launchpad.net/bugs/264789) for this with a whole bunch of other external drives/adapters. Created attachment 18686 [details]
kernel messages for 2.6.27.4 with A and B'
New patch replacing B:
Index: 2.6.27.4/drivers/scsi/scsi_lib.c
===================================================================
--- 2.6.27.4.orig/drivers/scsi/scsi_lib.c
+++ 2.6.27.4/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct
struct request *req = cmd->request;
unsigned long flags;
+ if (--req->retries <= 0) {
+ blk_end_request(req, -EIO, blk_rq_bytes(req));
+ scsi_next_command(cmd);
+ return;
+ }
scsi_unprep_request(req);
spin_lock_irqsave(q->queue_lock, flags);
blk_requeue_request(q, req);
@@ -690,6 +695,8 @@ static struct scsi_cmnd *scsi_end_reques
* leftovers in the front of the
* queue, and goose the queue again.
*/
+ if (bytes > 0) /* Made progress */
+ ++req->retries;
scsi_requeue_command(q, cmd);
cmd = NULL;
}
Created attachment 18687 [details]
usb traffic on 2.6.27.4 with A and B'
Created attachment 18688 [details]
kernel messages for 2.6.27.4 with A and B' and C
Created attachment 18689 [details]
usb traffic on 2.6.27.4 with A and B' and C
Tim: I'm not sure exactly what you're referring to. If you mean USB-IDE interfaces that return 0 bytes of data and no sense information when asked to read the last reported sector, then yes, they are broken. In fact, all interfaces which return the number of sectors instead of the last sector number when asked for the drive capacity (thereby getting the value too large by one) are broken. Thanks Alan. Yes, that was what I was referring to. It seems that there are an awful lot of them out there. The unusual_devs route seems to be the right place to "fix" the problem i.e. to have the value transformed to the correct one so that userspace things like udev can try to read raid signatures et al. It's just unfortunate that it seems we're going to need to add a lot of them. Ick. Obviously, removing the infinite retries is also a good thing :-) Notify-Also : Alan Stern <stern@rowland.harvard.edu> Notify-Also : Tim Wright <timw@splhi.com> Created attachment 18851 [details]
kernel messages for 2.6.27.4 with A and B''
Another patch replacing B (and thus B'):
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 62307bd..be44171 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
struct request *req = cmd->request;
unsigned long flags;
+ if (--req->retries < 0) {
+ blk_end_request(req, -EIO, blk_rq_bytes(req));
+ scsi_next_command(cmd);
+ return;
+ }
scsi_unprep_request(req);
spin_lock_irqsave(q->queue_lock, flags);
blk_requeue_request(q, req);
@@ -690,6 +695,8 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int error,
* leftovers in the front of the
* queue, and goose the queue again.
*/
+ if (bytes > 0) /* Made progress */
+ ++req->retries;
scsi_requeue_command(q, cmd);
cmd = NULL;
}
Created attachment 18852 [details]
usb traffic on 2.6.27.4 with A and B''
Created attachment 18853 [details]
kernel messages for 2.6.27.4 with A and B'' and C
Created attachment 18854 [details]
usb traffic on 2.6.27.4 with A and B'' and C
Okay, great! My own testing indicates that B'' by itself is enough to get things working. I think we've got this problem licked. The C patch hasn't yet made it into the mainline kernel, but it should get there soon (before 2.6.28-rc6). Handled-By : Luciano Rocha <luciano@eurotux.com> Patch : http://bugzilla.kernel.org/show_bug.cgi?id=11843#c26 Patch C has been accepted into the mainline kernel: commit 8010e06cc90367b4d3fba3b0ec3ced32360ac890. This bug report can be closed. There is a discussion in bugs.launchpad.net concerning this very problem in Ubuntu 08.10., where there are a few affected USB-IDs listed as well: 054c:0298 059b:0177 12c8:1f03 Additionally, Nokia Mobile Phones connected in storage mode are also affected. Link to launchpad: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/264789 The patch in comment #26 should fix the worst part of the problem (the unending loop of "No Sense" messages). (In reply to comment #34) > The patch in comment #26 should fix the worst part of the problem (the > unending > loop of "No Sense" messages). > Yes it should. It is clearly ludicrous for the kernel to infinitely retry and make no progress. Is there some reason that this was closed without the vital change (patch B'' in comment 26) being accepted into mainline? Patch C fixed only one of a myriad of defective devices out there. Can somebody please reopen this, or explain why the change is not being made, because it's still broken at the moment (see the Ubuntu bug for many examples) Part of the reason is that it's not possible to accept path B'' into the mainline. The kernel has moved on since 2.6.27 and the logic in the SCSI midlayer has changed; the patch no longer applies. Another part of the reason is that SCSI effectively has no maintainer at the moment. Yet another part is the it's far from clear that patch B'' is really correct. That is, there probably are other devices (certain kinds of SCSI tape drives) that depend on the code being the way it is; the patch would cause them to stop working. Finally, additional changes to usb-storage in the last month or so should have solved the problem with Nokia phones reported in the Ubuntu bug report. Sorry, you're absolutely correct. I downloaded and built 2.6.29-rc5, and it works flawlessly. Of course, it also has the correct unusual_devs[] array entry for my broken bridge anyways. |