Bug 11843

Summary: usb hdd problems with 2.6.27.2
Product: Drivers Reporter: Rafael J. Wysocki (rjw)
Component: USBAssignee: Greg Kroah-Hartman (greg)
Status: CLOSED CODE_FIX    
Severity: normal CC: josi, luciano, stern, timw, vdinamico
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27.2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 11167    
Attachments: kernel messages for 2.6.26.5
kernel messages for 2.6.27.4
kernel messages for 2.6.27.4 with commit 8bfa24727087d7252f9ecfb5fea2dfc92d797fbd
kernel messages for 2.6.27.4 with A and B
kernel messages for 2.6.27.4 with A and B and C
kernel messages for 2.6.27.4 with A and C
kernel messages for 2.6.27.4 with C
usb traffic on 2.6.26.5
usb traffic on 2.6.27.4
usb traffic on 2.6.27.4
usb traffic on 2.6.27.4 with A
usb traffic on 2.6.27.4 with A and B
usb traffic on 2.6.27.4 with A and B and C
usb traffic on 2.6.27.4 with A and C
usb traffic on 2.6.27.4 with C
kernel messages for 2.6.27.4 with A and B'
usb traffic on 2.6.27.4 with A and B'
kernel messages for 2.6.27.4 with A and B' and C
usb traffic on 2.6.27.4 with A and B' and C
kernel messages for 2.6.27.4 with A and B''
usb traffic on 2.6.27.4 with A and B''
kernel messages for 2.6.27.4 with A and B'' and C
usb traffic on 2.6.27.4 with A and B'' and C

Description Rafael J. Wysocki 2008-10-25 12:16:42 UTC
Subject    : usb hdd problems with 2.6.27.2
Submitter  : Luciano Rocha <luciano@eurotux.com>
Date       : 2008-10-22 16:22
References : http://marc.info/?l=linux-kernel&m=122469318102679&w=4

This entry is being used for tracking a regression from 2.6.26.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Luciano Rocha 2008-11-03 03:32:09 UTC
This bug is still present. There are a couple o patches I have to test while running usbmon, I'll post the trace here.
Comment 2 Luciano Rocha 2008-11-03 07:41:53 UTC
The dmesg and usbmon output follows, and here's the explanation for patches A, B and C:

patch A: report underflow with no sense data
         (commit 8bfa24727087d7252f9ecfb5fea2dfc92d797fbd)
patch B: reduce req->retries and return EIO if reached 0
patch C: unusual_devs entry for this specific usb enclosure

Summary of results:
2.6.26.5: OK
2.6.27.4: Infinite Sense errors
2.6.27.4 + A: Infinite Sense errors
2.6.27.4 + A + B: Two Sense errors, and I/O errors from sector 781422761 to 781422768 (reported as last sector)
2.6.27.4 + A + B + C: No Sense error, but I/O of sector 781422761 (reported last sector is: 781422767)
2.6.27.4 + A + C: OK
2.6.27.4 + C: OK
Comment 3 Luciano Rocha 2008-11-03 07:42:30 UTC
Created attachment 18621 [details]
kernel messages for 2.6.26.5
Comment 4 Luciano Rocha 2008-11-03 07:42:49 UTC
Created attachment 18622 [details]
kernel messages for 2.6.27.4
Comment 5 Luciano Rocha 2008-11-03 07:43:24 UTC
Created attachment 18623 [details]
kernel messages for 2.6.27.4 with commit 8bfa24727087d7252f9ecfb5fea2dfc92d797fbd

The commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bfa24727087d7252f9ecfb5fea2dfc92d797fbd
Comment 6 Luciano Rocha 2008-11-03 07:44:21 UTC
Created attachment 18624 [details]
kernel messages for 2.6.27.4 with A and B

The B patch:

Index: 2.6.27.4/drivers/scsi/scsi_lib.c
===================================================================
--- 2.6.27.4.orig/drivers/scsi/scsi_lib.c
+++ 2.6.27.4/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct 
        struct request *req = cmd->request;
        unsigned long flags;
 
+       if (--req->retries <= 0) {
+               blk_end_request(req, -EIO, blk_rq_bytes(req));
+               scsi_next_command(cmd);
+               return;
+       }
        scsi_unprep_request(req);
        spin_lock_irqsave(q->queue_lock, flags);
        blk_requeue_request(q, req);
Comment 7 Luciano Rocha 2008-11-03 07:45:00 UTC
Created attachment 18625 [details]
kernel messages for 2.6.27.4 with A and B and C

The C patch is:

Index: usb-2.6/drivers/usb/storage/unusual_devs.h
===================================================================
--- usb-2.6.orig/drivers/usb/storage/unusual_devs.h
+++ usb-2.6/drivers/usb/storage/unusual_devs.h
@@ -1251,6 +1251,13 @@ UNUSUAL_DEV( 0x0839, 0x000a, 0x0001, 0x0
                US_SC_DEVICE, US_PR_DEVICE, NULL,
                US_FL_FIX_INQUIRY),
 
+/* Reported by Luciano Rocha <luciano@eurotux.com> */
+UNUSUAL_DEV( 0x0840, 0x0082, 0x0001, 0x0001,
+               "Argosy",
+               "Storage",
+               US_SC_DEVICE, US_PR_DEVICE, NULL,
+               US_FL_FIX_CAPACITY),
+
 /* Entry and supporting patch by Theodore Kilgore <kilgota@auburn.edu>.
  * Flag will support Bulk devices which use a standards-violating 32-byte
  * Command Block Wrapper. Here, the "DC2MEGA" cameras (several brands) with
Comment 8 Luciano Rocha 2008-11-03 07:45:22 UTC
Created attachment 18626 [details]
kernel messages for 2.6.27.4 with A and C
Comment 9 Luciano Rocha 2008-11-03 07:45:42 UTC
Created attachment 18627 [details]
kernel messages for 2.6.27.4 with C
Comment 10 Luciano Rocha 2008-11-03 07:46:10 UTC
Created attachment 18628 [details]
usb traffic on 2.6.26.5
Comment 11 Luciano Rocha 2008-11-03 07:46:39 UTC
Created attachment 18629 [details]
usb traffic on 2.6.27.4
Comment 12 Luciano Rocha 2008-11-03 07:48:04 UTC
Created attachment 18630 [details]
usb traffic on 2.6.27.4
Comment 13 Luciano Rocha 2008-11-03 07:48:51 UTC
Created attachment 18631 [details]
usb traffic on 2.6.27.4 with A
Comment 14 Luciano Rocha 2008-11-03 07:50:05 UTC
Created attachment 18632 [details]
usb traffic on 2.6.27.4 with A and B
Comment 15 Luciano Rocha 2008-11-03 07:50:26 UTC
Created attachment 18633 [details]
usb traffic on 2.6.27.4 with A and B and C
Comment 16 Luciano Rocha 2008-11-03 07:50:49 UTC
Created attachment 18634 [details]
usb traffic on 2.6.27.4 with A and C
Comment 17 Luciano Rocha 2008-11-03 07:51:04 UTC
Created attachment 18635 [details]
usb traffic on 2.6.27.4 with C
Comment 18 Tim Wright 2008-11-03 11:36:02 UTC
At the very least this affects a large number of external USB controllers. Are they really *all* broken (not entirely implausible, but surprising), or do we have a different bug here? My old cheap external USB enclosure has the same issue. It's a "Prolific Technology, Inc. PL3507 ATAI6 Bridge", USB id 067b:3507. The drive itself, FWIW, is a Seagate ST325082. There's an open Ubuntu bug (https://bugs.launchpad.net/bugs/264789) for this with a whole bunch of other external drives/adapters.
Comment 19 Luciano Rocha 2008-11-05 02:23:03 UTC
Created attachment 18686 [details]
kernel messages for 2.6.27.4 with A and B'

New patch replacing B:

Index: 2.6.27.4/drivers/scsi/scsi_lib.c
===================================================================
--- 2.6.27.4.orig/drivers/scsi/scsi_lib.c
+++ 2.6.27.4/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct 
        struct request *req = cmd->request;
        unsigned long flags;
 
+       if (--req->retries <= 0) {
+               blk_end_request(req, -EIO, blk_rq_bytes(req));
+               scsi_next_command(cmd);
+               return;
+       }
        scsi_unprep_request(req);
        spin_lock_irqsave(q->queue_lock, flags);
        blk_requeue_request(q, req);
@@ -690,6 +695,8 @@ static struct scsi_cmnd *scsi_end_reques
                                 * leftovers in the front of the
                                 * queue, and goose the queue again.
                                 */
+                               if (bytes > 0)          /* Made progress */
+                                       ++req->retries;
                                scsi_requeue_command(q, cmd);
                                cmd = NULL;
                        }
Comment 20 Luciano Rocha 2008-11-05 02:23:36 UTC
Created attachment 18687 [details]
usb traffic on 2.6.27.4 with A and B'
Comment 21 Luciano Rocha 2008-11-05 02:24:01 UTC
Created attachment 18688 [details]
kernel messages for 2.6.27.4 with A and B' and C
Comment 22 Luciano Rocha 2008-11-05 02:24:22 UTC
Created attachment 18689 [details]
usb traffic on 2.6.27.4 with A and B' and C
Comment 23 Alan Stern 2008-11-05 09:10:26 UTC
Tim: I'm not sure exactly what you're referring to.  If you mean USB-IDE interfaces that return 0 bytes of data and no sense information when asked to read the last reported sector, then yes, they are broken.  In fact, all interfaces which return the number of sectors instead of the last sector number when asked for the drive capacity (thereby getting the value too large by one) are broken.
Comment 24 Tim Wright 2008-11-05 10:46:26 UTC
Thanks Alan. Yes, that was what I was referring to. It seems that there are an awful lot of them out there. The unusual_devs route seems to be the right place to "fix" the problem i.e. to have the value transformed to the correct one so that userspace things like udev can try to read raid signatures et al. It's just unfortunate that it seems we're going to need to add a lot of them. Ick.

Obviously, removing the infinite retries is also a good thing :-)
Comment 25 Rafael J. Wysocki 2008-11-09 13:06:43 UTC
Notify-Also : Alan Stern <stern@rowland.harvard.edu>
Notify-Also : Tim Wright <timw@splhi.com>
Comment 26 Luciano Rocha 2008-11-13 09:06:58 UTC
Created attachment 18851 [details]
kernel messages for 2.6.27.4 with A and B''

Another patch replacing B (and thus B'):

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 62307bd..be44171 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
        struct request *req = cmd->request;
        unsigned long flags;
 
+       if (--req->retries < 0) {
+               blk_end_request(req, -EIO, blk_rq_bytes(req));
+               scsi_next_command(cmd);
+               return;
+       }
        scsi_unprep_request(req);
        spin_lock_irqsave(q->queue_lock, flags);
        blk_requeue_request(q, req);
@@ -690,6 +695,8 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int error,
                                 * leftovers in the front of the
                                 * queue, and goose the queue again.
                                 */
+                               if (bytes > 0)          /* Made progress */
+                                       ++req->retries;
                                scsi_requeue_command(q, cmd);
                                cmd = NULL;
                        }
Comment 27 Luciano Rocha 2008-11-13 09:07:29 UTC
Created attachment 18852 [details]
usb traffic on 2.6.27.4 with A and B''
Comment 28 Luciano Rocha 2008-11-13 09:07:55 UTC
Created attachment 18853 [details]
kernel messages for 2.6.27.4 with A and B'' and C
Comment 29 Luciano Rocha 2008-11-13 09:08:27 UTC
Created attachment 18854 [details]
usb traffic on 2.6.27.4 with A and B'' and C
Comment 30 Alan Stern 2008-11-13 12:14:36 UTC
Okay, great!  My own testing indicates that B'' by itself is enough to get things working.

I think we've got this problem licked.  The C patch hasn't yet made it into the mainline kernel, but it should get there soon (before 2.6.28-rc6).
Comment 31 Rafael J. Wysocki 2008-11-16 11:12:09 UTC
Handled-By : Luciano Rocha <luciano@eurotux.com>
Patch : http://bugzilla.kernel.org/show_bug.cgi?id=11843#c26
Comment 32 Alan Stern 2008-11-16 12:25:11 UTC
Patch C has been accepted into the mainline kernel: commit 8010e06cc90367b4d3fba3b0ec3ced32360ac890.  This bug report can be closed.
Comment 33 Johannes Postler 2008-11-24 06:45:48 UTC
There is a discussion in bugs.launchpad.net concerning this very problem in Ubuntu 08.10., where there are a few affected USB-IDs listed as well:

054c:0298
059b:0177
12c8:1f03

Additionally, Nokia Mobile Phones connected in storage mode are also affected.
Link to launchpad:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/264789
Comment 34 Alan Stern 2008-11-24 07:03:21 UTC
The patch in comment #26 should fix the worst part of the problem (the unending loop of "No Sense" messages).
Comment 35 Tim Wright 2009-02-16 15:54:31 UTC
(In reply to comment #34)
> The patch in comment #26 should fix the worst part of the problem (the
> unending
> loop of "No Sense" messages).
> 

Yes it should. It is clearly ludicrous for the kernel to infinitely retry and make no progress. Is there some reason that this was closed without the vital change (patch B'' in comment 26) being accepted into mainline? Patch C fixed only one of a myriad of defective devices out there.

Can somebody please reopen this, or explain why the change is not being made, because it's still broken at the moment (see the Ubuntu bug for many examples)
Comment 36 Alan Stern 2009-02-16 19:19:00 UTC
Part of the reason is that it's not possible to accept path B'' into the mainline.  The kernel has moved on since 2.6.27 and the logic in the SCSI midlayer has changed; the patch no longer applies.

Another part of the reason is that SCSI effectively has no maintainer at the moment.

Yet another part is the it's far from clear that patch B'' is really correct.  That is, there probably are other devices (certain kinds of SCSI tape drives) that depend on the code being the way it is; the patch would cause them to stop working.

Finally, additional changes to usb-storage in the last month or so should have solved the problem with Nokia phones reported in the Ubuntu bug report.
Comment 37 Tim Wright 2009-02-17 16:11:29 UTC
Sorry, you're absolutely correct. I downloaded and built 2.6.29-rc5, and it works flawlessly. Of course, it also has the correct unusual_devs[] array entry for my broken bridge anyways.