Bug 8892 - USB hard disk broken in 2.6.23-rc3
Summary: USB hard disk broken in 2.6.23-rc3
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Alan Stern
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-15 15:12 UTC by Roman Jarosz
Modified: 2007-09-11 11:55 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.23-rc3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
/var/log/messages log (83.09 KB, application/octet-stream)
2007-08-15 15:17 UTC, Roman Jarosz
Details
lsusb -v output (29.25 KB, text/plain)
2007-08-16 09:21 UTC, Roman Jarosz
Details
Fail with idle-time delay set to 30 seconds (165.51 KB, application/octet-stream)
2007-08-16 15:05 UTC, Roman Jarosz
Details
lsusb output after fail (27.00 KB, application/octet-stream)
2007-08-16 15:07 UTC, Roman Jarosz
Details
Update the last_busy field properly (1.42 KB, patch)
2007-08-17 12:22 UTC, Alan Stern
Details | Diff
dmesg log with the patch applied (29.99 KB, application/octet-stream)
2007-08-19 03:56 UTC, Roman Jarosz
Details
Messages log with the patch applied (102.13 KB, application/octet-stream)
2007-08-19 03:59 UTC, Roman Jarosz
Details
Fail on resume even with the patch (221.83 KB, application/octet-stream)
2007-08-19 12:39 UTC, Roman Jarosz
Details
message log with 30s delay (63.22 KB, application/octet-stream)
2007-08-21 01:43 UTC, Roman Jarosz
Details
Resume fail with 30s delay when playing music (92.25 KB, application/octet-stream)
2007-08-21 04:29 UTC, Roman Jarosz
Details

Description Roman Jarosz 2007-08-15 15:12:38 UTC
Most recent kernel where this bug did not occur: 2.6.22.1
Distribution: Gentoo x86
Hardware Environment: 
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express PCI Express Root Port (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation G72M [Quadro NVS 110M/GeForce Go 7300] (rev a1)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02)
04:01.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev b3)
04:01.1 FireWire (IEEE 1394): Ricoh Co Ltd R5C552 IEEE 1394 Controller (rev 08)
04:01.2 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 17)
04:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 08)

Software Environment:
gcc 4.2.0

Problem Description:
I was watching moovie from my usb harddisk and after approximately one hour
the moovie stoped and I couldn't read anything from usb disk. I've reproduced this bug 4 times with (2.6.23-rc2 and 2.6.23-rc3) and I also tried patch from http://lkml.org/lkml/2007/8/10/195 but it did not help. The USB harddisk works in 2.6.22.1. I will attached dmesg log with usb debug enabled.
Comment 1 Roman Jarosz 2007-08-15 15:17:33 UTC
Created attachment 12397 [details]
/var/log/messages log
Comment 2 Anonymous Emailer 2007-08-15 15:21:15 UTC
Reply-To: akpm@linux-foundation.org

On Wed, 15 Aug 2007 15:05:38 -0700 (PDT)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8892
> 
>            Summary: USB hard disk broken in 2.6.23-rc3
>            Product: Drivers
>            Version: 2.5
>      KernelVersion: 2.6.23-rc3
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: USB
>         AssignedTo: greg@kroah.com
>         ReportedBy: kedgedev@centrum.cz
> 
> 
> Most recent kernel where this bug did not occur: 2.6.22.1
> Distribution: Gentoo x86
> Hardware Environment: 
> 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and
> 945GT Express Memory Controller Hub (rev 03)
> 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and
> 945GT
> Express PCI Express Root Port (rev 03)
> 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition
> Audio Controller (rev 02)
> 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1
> (rev 02)
> 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4
> (rev 02)
> 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> Controller #1 (rev 02)
> 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> Controller #2 (rev 02)
> 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> Controller #3 (rev 02)
> 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> Controller #4 (rev 02)
> 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
> Controller (rev 02)
> 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
> 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge
> (rev 02)
> 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller
> (rev 02)
> 01:00.0 VGA compatible controller: nVidia Corporation G72M [Quadro NVS
> 110M/GeForce Go 7300] (rev a1)
> 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B
> PCI
> Express Gigabit Ethernet controller (rev 01)
> 03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network
> Connection (rev 02)
> 04:01.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev b3)
> 04:01.1 FireWire (IEEE 1394): Ricoh Co Ltd R5C552 IEEE 1394 Controller (rev
> 08)
> 04:01.2 Generic system peripheral [0805]: Ricoh Co Ltd R5C822
> SD/SDIO/MMC/MS/MSPro Host Adapter (rev 17)
> 04:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter
> (rev 08)
> 
> Software Environment:
> gcc 4.2.0
> 
> Problem Description:
> I was watching moovie from my usb harddisk and after approximately one hour
> the moovie stoped and I couldn't read anything from usb disk. I've reproduced
> this bug 4 times with (2.6.23-rc2 and 2.6.23-rc3) and I also tried patch from
> http://lkml.org/lkml/2007/8/10/195 but it did not help. The USB harddisk
> works
> in 2.6.22.1. I will attached dmesg log with usb debug enabled.
> 
> 

Michal, can we please add this to the post-2.6.22 regressions list?

Thanks.
Comment 3 Michal Piotrowski 2007-08-16 00:26:41 UTC
This might be a problem with "usb-storage: implement autosuspend" commit 
8dfe4b14869fd185ca25ee88b02ada58a3005eaf

http://lkml.org/lkml/2007/8/9/329
Comment 4 Michal Piotrowski 2007-08-16 00:42:44 UTC
Please provide lsusb output
Comment 5 Alan Stern 2007-08-16 07:18:32 UTC
This is not entirely the fault of the commit mentioned in comment #3, because the device worked perfectly well with numerous autosuspends in the log between 18:54:03 and 19:37:04, when the device died for no apparent reason.

On the other hand, the log also shows that the default idle-time delay for autosuspend, 2 seconds, is too short for that disk.  You should increase it, say to 30 seconds.  The command is

   echo N >/sys/bus/usb/devices/1-4/power/autosuspend

where "N" should be set to 30 or whatever number you choose.  This can be done by hand, as part of a udev script, or any other way you like.
Comment 6 Roman Jarosz 2007-08-16 09:21:11 UTC
Created attachment 12406 [details]
lsusb -v output
Comment 7 Roman Jarosz 2007-08-16 09:23:29 UTC
(In reply to comment #5)
> This is not entirely the fault of the commit mentioned in comment #3, because
> the device worked perfectly well with numerous autosuspends in the log
> between
> 18:54:03 and 19:37:04, when the device died for no apparent reason.
> 
> On the other hand, the log also shows that the default idle-time delay for
> autosuspend, 2 seconds, is too short for that disk.  You should increase it,
> say to 30 seconds.  The command is
> 
>    echo N >/sys/bus/usb/devices/1-4/power/autosuspend
> 
> where "N" should be set to 30 or whatever number you choose.  This can be
> done
> by hand, as part of a udev script, or any other way you like.
> 
I will set idle-time delay to 30 seconds and check if it works.
Comment 8 Roman Jarosz 2007-08-16 15:05:50 UTC
Created attachment 12410 [details]
Fail with idle-time delay set to 30 seconds

Setting idle-time delay to 30 seconds didn't help :( As you can see the usb failed on resume (Aug 16 21:38:00 in log)

Btw I was messing with my bluetooth mouse and keyboard so the log is bigger sorry for that.
Comment 9 Roman Jarosz 2007-08-16 15:07:09 UTC
Created attachment 12411 [details]
lsusb output after fail
Comment 10 Alan Stern 2007-08-17 12:22:30 UTC
Created attachment 12432 [details]
Update the last_busy field properly

Good testing.  Your log shows that there is a bug in the USB autosuspend code.  This patch ought to fix it.  In fact, you might not even have to set the idle-time delay.  Try it and see.
Comment 11 Roman Jarosz 2007-08-18 01:41:50 UTC
Alan thanks for the quick fix. It seems that your patch fixed this bug. I was testing it yesterday for 2 hours and everything was ok. I'll be testing it today too, but I don't expect any problems.

Btw is it save to leave the idle-time delay to 2 seconds because it suspends usb even if I'm watching moovie from it and than it immediately resume?

Thanks
Comment 12 Alan Stern 2007-08-18 07:40:52 UTC
I'll submit the bug fix for inclusion in the standard kernel.  But first can you please attach the dmesg log with the patch applied?

You can always increase the delay time to a value larger than 2 seconds.  Whether   it is safe or not is hard to say.  The log showed that your device successfully suspended and resumed a number of times even with the bug and the 2-second delay, before it froze up.  I guess in the end you'll have to answer this question yourself, by experimenting.
Comment 13 Roman Jarosz 2007-08-19 03:56:30 UTC
Created attachment 12441 [details]
dmesg log with the patch applied
Comment 14 Roman Jarosz 2007-08-19 03:59:45 UTC
Created attachment 12442 [details]
Messages log with the patch applied

This is log from yesterday when I was testing it. The usb disk was on the whole day without any problems. The dmesg log is from today.
Comment 15 Alan Stern 2007-08-19 07:54:14 UTC
Judging from the number of suspend and resume messages in the messages log, it looks like 2 seconds is definitely too short.  Let's see what the log shows with the delay set to 30 seconds.
Comment 16 Roman Jarosz 2007-08-19 08:17:43 UTC
Actually the appended log with 30 seconds delay has a lot of suspend and resume messages too, because I paused the moovie and waited till the usb suspends and then than I started the moovie.
Comment 17 Roman Jarosz 2007-08-19 12:39:31 UTC
Created attachment 12446 [details]
Fail on resume even with the patch

Bad news the resume failed again at Aug 19 21:10:16 in the log :( But maybe it's because the 2 seconds idle delay is to small. Anyway I'm attaching the log.
Comment 18 Alan Stern 2007-08-20 09:12:38 UTC
Yes, definitely 2 seconds is too small.  What does the log show with 30 seconds if you just play movies normally, without lots of deliberate pauses?
Comment 19 Roman Jarosz 2007-08-21 01:43:17 UTC
Created attachment 12466 [details]
message log with 30s delay

Here is message log with 30s idle delay. I was watching movies from 19:12:03 to 23:44:47 it didn't suspended as I expected.

Today I will do another test I will play mp3s from usb disk and notebook disk with 30s idle delay, so that always one song will be from usb disk and the next from notebook disk.
Comment 20 Roman Jarosz 2007-08-21 04:29:22 UTC
Created attachment 12469 [details]
Resume fail with 30s delay when playing music

After 3 hours of playing music the usb failed on resume. (Aug 21 12:58:32)

I don't know if it's fault of my usb disk/hub or kernel bug, but if it's kernel bug I would like to help debug this problem so it gets fixed.

Thanks
Comment 21 Alan Stern 2007-08-21 07:54:28 UTC
The log in comment #20 shows that you had the idle delay set to 3 seconds, not 30 seconds!  But setting it to 30 wouldn't have helped.  The fact is, your device is just a little buggy.  Most of the time it resumes okay, but sometimes it doesn't.  

Your best approach will be to disable autosuspend entirely.  You can do that very easily by setting the idle delay to -1.  That should solve your problem completely; if it does we can close the bug report.

In fact, there's an excellent chance that the final release version of 2.6.23 will make -1 the default value for the idle delay, so that devices won't autosuspend unless the user specifically changes the value.
Comment 22 Alan Stern 2007-09-11 11:55:26 UTC
I submitted a patch to disable autosuspend for this device, and it has been accepted for 2.6.23.  Closing the bug.

Note You need to log in before you can comment on or make changes to this bug.