Bug 36092 - Filesystem lockup on USB HDD plug-in
Summary: Filesystem lockup on USB HDD plug-in
Status: CLOSED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 high
Assignee: io_other
URL:
Keywords:
: 36102 (view as bug list)
Depends on:
Blocks: 32012
  Show dependency tree
 
Reported: 2011-05-28 12:34 UTC by Arno Wagner
Modified: 2011-07-10 10:47 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.38.6, 2.5.38.7, 2.6.39.1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Crash log (1.88 KB, application/octet-stream)
2011-05-28 12:34 UTC, Arno Wagner
Details
kernel config for crash (66.77 KB, application/octet-stream)
2011-05-28 12:35 UTC, Arno Wagner
Details
lspci -v (10.16 KB, application/octet-stream)
2011-05-28 12:35 UTC, Arno Wagner
Details
lsusb -v recorded with kernel 2.6.38.5 (22.24 KB, application/octet-stream)
2011-05-28 12:38 UTC, Arno Wagner
Details

Description Arno Wagner 2011-05-28 12:34:28 UTC
Created attachment 59762 [details]
Crash log

Hi,

I recently upgraded to kernel 2.6.39 and today found that on plug-in of a specific WD USB HDD, I get a hard filesystem lockup. Manual regression showed the problem back to 2.6.38.6 , but not in 2.6.38.5 or .4. This does not happen with two other (a bit older) WD USB HDDs.

Attached are
- Crash log captured from serial console
- lspci
- lsusb
- kernel .config file

If you need anything else, please let me know.

Gr"usse,
Arno
Comment 1 Arno Wagner 2011-05-28 12:35:18 UTC
Created attachment 59772 [details]
kernel config for crash
Comment 2 Arno Wagner 2011-05-28 12:35:44 UTC
Created attachment 59782 [details]
lspci -v
Comment 3 Arno Wagner 2011-05-28 12:38:39 UTC
Created attachment 59792 [details]
lsusb -v recorded with kernel 2.6.38.5
Comment 4 Arno Wagner 2011-06-09 10:21:43 UTC
*** Bug 36102 has been marked as a duplicate of this bug. ***
Comment 5 Arno Wagner 2011-06-09 10:25:37 UTC
Bug is still present with 2.6.39.1.

Some experimental results (2.6.39.1):
- With "high memory support" at 64G or 4G: 
      reliable crash on device detection
- With "high memory support" at "off" (PAE on or off): 
      no crash
Comment 6 Arno Wagner 2011-06-13 16:36:42 UTC
I have now had an uml proccess crash 2.6.38 hard two times. (Never happened before, before only the uml process would crash, which is due to an application that has memory leaks and that is isolated in the uml instance because of that). This makes me suspect the root cause of this problem was at least introduced in 2.6.38. Maybe the BKL elemination went wrong in some places.
Comment 7 Rafael J. Wysocki 2011-06-13 17:16:58 UTC
On Monday, June 13, 2011, Arno Wagner wrote:
> Anyways, this regression is active and seems to get worse.
> Or I am just finding more ways to crash 2.6.38.x and 2.5.39.x.
> 
> 
> On Sun, Jun 12, 2011 at 11:12:10PM +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.38 and 2.6.39.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.38 and 2.6.39.  Please verify if it still should
> > be listed and let the tracking team know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=36092
> > Subject             : Filesystem lockup on USB HDD plug-in
> > Submitter   : Arno Wagner <arno@wagner.name>
> > Date                : 2011-05-28 12:34 (16 days old)
Comment 8 Arno Wagner 2011-06-15 20:45:44 UTC
Found some time to test newer versions. Current findings on plug-in of the "special" USB HDD:

- 2.6.38 ... 2.6.38.5 do not crash but are unstable anyways, see above
- 2.6.38.6 crashes
- 2.6.38.7 crashes
- 2.6.38.8 does not crash. No idea about general stability.

- 2.6.39.1 crashes

- 3.0-rc3  does not crash

Seems one of the numerous fixes in 2.6.38.8 at least made the problem less easy to trigger and may have fixed it. 3.0-rc3 seems to have gotten a relevant fix as well.
Comment 9 Arno Wagner 2011-06-20 02:48:36 UTC
Update:
2.6.38.8 crashed on me today on my server with partial loss of 
filesystem access. Ssh in was still possible for a few minutes.
This was with the 4GB memory model. I am now back to the 1GB 
memory model on that machine, despite it having 4GB ECC RAM.

As I observe this instability on two different machines with 
different chipsets, mainborads, CPUs, ..., I conclude the kernel
and not the hardware is to blame. I hope this mess will be
gone in 3.0 or at least 3.0.x with low x.

Hence:
- 2.6.38.8: unstable
Comment 10 Rafael J. Wysocki 2011-06-26 22:23:08 UTC
Do I understand that the problem is not present in the current mainline?
Comment 11 Arno Wagner 2011-06-27 01:57:15 UTC
There are actually two problems:
1. Immediate filesystem lockup when I plug-in a specific USB HDD
2. Filesystem lockups when a process I encapsulate in UML causes repeated OOM kill actions within the UML instance. 

Both look pretty similar. Both go away with 1GB memory model. The USB thing is fixed in 2.6.38.8 and 3.0-rc3, but not in 2.6.39.1. The crashed by UML problem is still present in 2.6.38.8. 

As 3.0 is not yet released, the current mainline should be 2.6.39, right? 
The problem with USB-plugin crash for the specific HDD reported is present in 2.6.39.1. I have not yet tested 2.6.39.2, hopefully I can try it in the next few days. 

I have no idea about the stability of 3.0 with regard to the other crash issue, I am not going to put a -rc on my production server.

But as I said, I strongly suspect the crash on plugin of the USB HDD is just a symptom of a far more serious problem when using a 4GB or 64GB memory model.
Comment 12 Rafael J. Wysocki 2011-06-28 08:20:41 UTC
On Tuesday, June 28, 2011, Arno Wagner wrote:
> 
> Not present in 2.6.39.2 anymore. As it is also not 
> present in 2.6.38.8 and 3.0-rc3, this specific issue 
> has been fixed. 
> 
> Note that at least 2.6.38.8 still is unstable when 
> running with 4GB or 64GB memory model and 2.6.39.2 might 
> be too.
> 
> Arno
> 
> 
> On Mon, Jun 27, 2011 at 12:35:11AM +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.38 and 2.6.39.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.38 and 2.6.39.  Please verify if it still should
> > be listed and let the tracking team know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=36092
> > Subject             : Filesystem lockup on USB HDD plug-in
> > Submitter   : Arno Wagner <arno@wagner.name>
> > Date                : 2011-05-28 12:34 (30 days old)

Note You need to log in before you can comment on or make changes to this bug.