Bug 15225 - kernel panics triggered under high load
Summary: kernel panics triggered under high load
Status: RESOLVED OBSOLETE
Alias: None
Product: Virtualization
Classification: Unclassified
Component: Xen (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: virtualization_xen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-04 12:38 UTC by Brad Plant
Modified: 2012-06-27 13:18 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.32.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Stack trace 1 (6.84 KB, text/plain)
2010-02-04 12:38 UTC, Brad Plant
Details
stack trace 2 (5.61 KB, text/plain)
2010-02-04 12:38 UTC, Brad Plant
Details

Description Brad Plant 2010-02-04 12:38:23 UTC
Created attachment 24907 [details]
Stack trace 1

I've been putting some pvops xen nodes under very high load by doing lots of file IO on an ocfs2 FS which of course generates lots of network IO for the ocfs2 locking stuff.

I'm finding that I'm getting lots of kernel panics when putting the nodes under this very high load. I've tried both with and without CONFIG_PARAVIRT_SPINLOCKS, but it didn't make any difference. I've uploaded all the stack traces that I've collected along with a gdb back trace.

I'm not sure if this is actually a xen/pvops issue. It could be an ocfs2 issue, but ocfs2 issue mentioned in the stack traces. If it's not a xen/pvops issue, what is it likely to be?
Comment 1 Brad Plant 2010-02-04 12:38:53 UTC
Created attachment 24908 [details]
stack trace 2
Comment 2 Alan 2010-02-09 15:55:25 UTC
These look like fairly random memory corruption - which makes it very hard to guess where the fault might be.

Do you have a previous known stable configuration ?
Do you see the crashes on multiple hardware boxes ?
Comment 3 Brad Plant 2010-02-09 20:39:20 UTC
(In reply to comment #2)
> These look like fairly random memory corruption - which makes it very hard to
> guess where the fault might be.

Should I enable some of the kernel memory debugging features? There's a lot of options under the kernel debugging menu - which ones would you suggest I enable?

> Do you have a previous known stable configuration ?

No, I've seen similar behaviour in the past: http://bugzilla.kernel.org/show_bug.cgi?id=13631

Unfortunately in the past it's been a case of everyone pointing the finger at someone else. After putting the issue aside for a while, I thought it'd be good to start fresh with a new kernel release :)

> Do you see the crashes on multiple hardware boxes ?

Yes, I've got 2 Dell 2950's with ECC memory and it happens on both. I've run memtest86 on these boxes before which has never shown any problems.

The problem only shows when putting the xen guest under high load and no other guests are affected or ever show any problems.
Comment 4 Alan 2012-06-27 13:18:03 UTC
Not much we could do with the report alas. Closing it as obsolete

Note You need to log in before you can comment on or make changes to this bug.