Bug 11543 - kernel panic: softlockup in tick_periodic() ???
Summary: kernel panic: softlockup in tick_periodic() ???
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks: Regressions-2.6.26
  Show dependency tree
 
Reported: 2008-09-11 16:46 UTC by Joshua Hoblitt
Modified: 2008-12-08 10:34 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.27-rc4-21704-gd25e26b
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg (42.41 KB, text/plain)
2008-09-11 16:50 UTC, Joshua Hoblitt
Details
acpi table dump (85.80 KB, text/plain)
2008-09-17 02:18 UTC, Joshua Hoblitt
Details
kernel panics from 2008-10-17 -> 2008-11-13 (75.73 KB, text/plain)
2008-11-13 13:50 UTC, Joshua Hoblitt
Details
2.6.27-rc8 ,config (46.12 KB, application/octet-stream)
2008-11-17 11:51 UTC, Joshua Hoblitt
Details

Description Joshua Hoblitt 2008-09-11 16:46:29 UTC
[11532.103605] do_IRQ: 0.175 No irq handler for vector
<Sep/11 12:13 pm>[11532.103613] do_IRQ: 2.175 No irq handler for vector
<Sep/11 12:13 pm>[11532.103617] do_IRQ: 1.175 No irq handler for vector
<Sep/11 12:14 pm>[11560.779989] do_IRQ: 0.179 No irq handler for vector
<Sep/11 12:15 pm>[11622.181968] Kernel panic - not syncing: softlockup: hung tas<Sep/11 12:15 pm>
                 <Sep/11 12:15 pm>[11622.181968] ------------[ cut here ]------------
<Sep/11 12:15 pm>[11622.181968] WARNING: at kernel/mutex.c:351 mutex_trylock+0x45/0xf6()
<Sep/11 12:15 pm>[11622.181968] Modules linked in: w83627hf hwmon_vid autofs4 smsc37b787_wdt k8temp forcedeth i2c_nforce2 i2c_core tg3 libphy e1000 xfs dm_snapshot dm_mirror dm_log aacraid 3w_9xxx 3w_xxxx atp870u arcmsr aic7xxx scsi_wait_scan
<Sep/11 12:15 pm>[11622.181968] Pid: 17192, comm: ppImage Not tainted 2.6.27-rc4-21704-gd25e26b #1
<Sep/11 12:15 pm>[11622.181968] 
<Sep/11 12:15 pm>[11622.181968] Call Trace:
<Sep/11 12:15 pm>[11622.181968]  <IRQ>  [<ffffffff80235319>] warn_on_slowpath+0x51/0x77
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff802357ce>] release_console_sem+0x3e/0x1a1
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff805c6031>] mutex_trylock+0x45/0xf6
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8025efc3>] crash_kexec+0x17/0xef
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff803bb5b9>] bust_spinlocks+0x15/0x30
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff80235218>] panic+0x8f/0x13f
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff802357ce>] release_console_sem+0x3e/0x1a1
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff802357ce>] release_console_sem+0x3e/0x1a1
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff80272145>] softlockup_tick+0x19e/0x1ab
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8023dda4>] update_process_times+0x26/0x4b

<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8024f7a4>] tick_periodic+0x6e/0x79
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8024f7c7>] tick_handle_periodic+0x18/0x59
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8024f96a>] tick_do_broadcast+0x4d/0x86
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8024fa20>] tick_do_periodic_broadcast+0x23/0x31
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8024fa3c>] tick_handle_periodic_broadcast+0xe/0x42
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8020e9f6>] timer_event_interrupt+0x1a/0x21
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff80272591>] handle_IRQ_event+0x1e/0x4c
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff80273885>] handle_edge_irq+0xe8/0x12b
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8020e96f>] do_IRQ+0xf1/0x15e
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8020c3e1>] ret_from_intr+0x0/0xa
<Sep/11 12:15 pm>[11622.181968]  <EOI>  [<ffffffff8021b79b>] native_flush_tlb_others+0x64/0xb3
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8021b7c5>] native_flush_tlb_others+0x8e/0xb3
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8021b7be>] native_flush_tlb_others+0x87/0xb3
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8021b8b2>] flush_tlb_page+0x5e/0x65
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff8022531b>] ptep_set_access_flags+0x1b/0x1f
<Sep/11 12:15 pm>[11622.181968]  [<ffffffff80285193>] do_wp_page+0x48b/0x51e
Comment 1 Joshua Hoblitt 2008-09-11 16:50:47 UTC
Created attachment 17734 [details]
dmesg
Comment 2 Joshua Hoblitt 2008-09-11 16:51:49 UTC
I just noticed that there are warnings in the dmesg about the NMI watchdog.  Is this the watchdog malfunctioning?

[    0.456700] Testing NMI watchdog ... 
[    0.529091] WARNING: CPU#0: NMI appears to be stuck (0->0)!
[    0.529968] Please report this to bugzilla.kernel.org,
[    0.533301] and attach the output of the 'dmesg' command.
[    0.536635] 
[    0.539968] WARNING: CPU#1: NMI appears to be stuck (0->0)!
[    0.543301] Please report this to bugzilla.kernel.org,
[    0.546635] and attach the output of the 'dmesg' command.
[    0.549968] 
[    0.551646] WARNING: CPU#2: NMI appears to be stuck (0->0)!
[    0.553301] Please report this to bugzilla.kernel.org,
[    0.556635] and attach the output of the 'dmesg' command.
[    0.559968] 
[    0.563302] WARNING: CPU#3: NMI appears to be stuck (0->0)!
[    0.566635] Please report this to bugzilla.kernel.org,
[    0.569968] and attach the output of the 'dmesg' command.
Comment 3 Anonymous Emailer 2008-09-11 17:03:20 UTC
Reply-To: akpm@linux-foundation.org


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 11 Sep 2008 16:46:29 -0700 (PDT)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=11543
> 
>            Summary: kernel panic: softlockup in tick_periodic() ???
>            Product: Platform Specific/Hardware
>            Version: 2.5
>      KernelVersion: 2.6.27-rc4-21704-gd25e26b
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: x86-64
>         AssignedTo: platform_x86_64@kernel-bugs.osdl.org
>         ReportedBy: j_kernel@hoblitt.com
> 

Is this a regression?  Was 2.6.26 OK, for example?

> [11532.103605] do_IRQ: 0.175 No irq handler for vector
> <Sep/11 12:13 pm>[11532.103613] do_IRQ: 2.175 No irq handler for vector
> <Sep/11 12:13 pm>[11532.103617] do_IRQ: 1.175 No irq handler for vector
> <Sep/11 12:14 pm>[11560.779989] do_IRQ: 0.179 No irq handler for vector
> <Sep/11 12:15 pm>[11622.181968] Kernel panic - not syncing: softlockup: hung
> tas<Sep/11 12:15 pm>
>                  <Sep/11 12:15 pm>[11622.181968] ------------[ cut here
> ]------------
> <Sep/11 12:15 pm>[11622.181968] WARNING: at kernel/mutex.c:351
> mutex_trylock+0x45/0xf6()
> <Sep/11 12:15 pm>[11622.181968] Modules linked in: w83627hf hwmon_vid autofs4
> smsc37b787_wdt k8temp forcedeth i2c_nforce2 i2c_core tg3 libphy e1000 xfs
> dm_snapshot dm_mirror dm_log aacraid 3w_9xxx 3w_xxxx atp870u arcmsr aic7xxx
> scsi_wait_scan
> <Sep/11 12:15 pm>[11622.181968] Pid: 17192, comm: ppImage Not tainted
> 2.6.27-rc4-21704-gd25e26b #1
> <Sep/11 12:15 pm>[11622.181968] 
> <Sep/11 12:15 pm>[11622.181968] Call Trace:
> <Sep/11 12:15 pm>[11622.181968]  <IRQ>  [<ffffffff80235319>]
> warn_on_slowpath+0x51/0x77
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff802357ce>]
> release_console_sem+0x3e/0x1a1
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff805c6031>] mutex_trylock+0x45/0xf6
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8025efc3>] crash_kexec+0x17/0xef
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff803bb5b9>]
> bust_spinlocks+0x15/0x30
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff80235218>] panic+0x8f/0x13f
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff802357ce>]
> release_console_sem+0x3e/0x1a1
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff802357ce>]
> release_console_sem+0x3e/0x1a1
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff80272145>]
> softlockup_tick+0x19e/0x1ab
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8023dda4>]
> update_process_times+0x26/0x4b
> 
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8024f7a4>] tick_periodic+0x6e/0x79
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8024f7c7>]
> tick_handle_periodic+0x18/0x59
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8024f96a>]
> tick_do_broadcast+0x4d/0x86
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8024fa20>]
> tick_do_periodic_broadcast+0x23/0x31
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8024fa3c>]
> tick_handle_periodic_broadcast+0xe/0x42
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8020e9f6>]
> timer_event_interrupt+0x1a/0x21
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff80272591>]
> handle_IRQ_event+0x1e/0x4c
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff80273885>]
> handle_edge_irq+0xe8/0x12b
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8020e96f>] do_IRQ+0xf1/0x15e
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8020c3e1>] ret_from_intr+0x0/0xa
> <Sep/11 12:15 pm>[11622.181968]  <EOI>  [<ffffffff8021b79b>]
> native_flush_tlb_others+0x64/0xb3
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8021b7c5>]
> native_flush_tlb_others+0x8e/0xb3
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8021b7be>]
> native_flush_tlb_others+0x87/0xb3
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8021b8b2>]
> flush_tlb_page+0x5e/0x65
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff8022531b>]
> ptep_set_access_flags+0x1b/0x1f
> <Sep/11 12:15 pm>[11622.181968]  [<ffffffff80285193>] do_wp_page+0x48b/0x51e

argh, death by wordwrapping.

I can't work out who called panic(), nor why.

The panic code called the kexec code which called mutex_trylock() which
called spin_lock_mutex() which then stupidly went and blurted a load of
debug stuff because of in_interrupt().

Something like this:

--- a/include/linux/debug_locks.h~a
+++ a/include/linux/debug_locks.h
@@ -17,7 +17,7 @@ extern int debug_locks_off(void);
 ({									\
 	int __ret = 0;							\
 									\
-	if (unlikely(c)) {						\
+	if (!oops_in_progress && unlikely(c)) {				\
 		if (debug_locks_off() && !debug_locks_silent)		\
 			WARN_ON(1);					\
 		__ret = 1;						\
_

might prevent the debugging code from preventing us from finding bugs :(
Comment 4 Joshua Hoblitt 2008-09-11 19:55:37 UTC
On Thu, Sep 11, 2008 at 05:02:58PM -0700, Andrew Morton wrote:
> Is this a regression?  Was 2.6.26 OK, for example?

It might be a regression. ;) The last build we were running on this
hardware was 2.6.24.2 and NMI watchdog support was not enabled.  We were
however experiencing random deadlocks, which I had been attributing to
problems with forcedeth.c (which causes the NIC to totally crap out
but not deadlock the machine) but I am now of the mind that there are
multiple problems with distinct failure modes.

> I can't work out who called panic(), nor why.

One more data point.  We booted this kernel on 14 machines this morning
and only one has had this panic thus far...

> The panic code called the kexec code which called mutex_trylock() which
> called spin_lock_mutex() which then stupidly went and blurted a load of
> debug stuff because of in_interrupt().
> 
> Something like this:
> 
> --- a/include/linux/debug_locks.h~a
> +++ a/include/linux/debug_locks.h
> @@ -17,7 +17,7 @@ extern int debug_locks_off(void);
>  ({                                                                   \
>       int __ret = 0;                                                  \
>                                                                       \
> -     if (unlikely(c)) {                                              \
> +     if (!oops_in_progress && unlikely(c)) {                         \
>               if (debug_locks_off() && !debug_locks_silent)           \
>                       WARN_ON(1);                                     \
>               __ret = 1;                                              \
> _
> 
> might prevent the debugging code from preventing us from finding bugs :(

Do you want me to give that patch a try or sit tight for a bit?

-J

--
Comment 5 Anonymous Emailer 2008-09-11 19:58:07 UTC
Reply-To: akpm@linux-foundation.org

On Thu, 11 Sep 2008 16:54:58 -1000 j_kernel@hoblitt.com wrote:

> > The panic code called the kexec code which called mutex_trylock() which
> > called spin_lock_mutex() which then stupidly went and blurted a load of
> > debug stuff because of in_interrupt().
> > 
> > Something like this:
> > 
> > --- a/include/linux/debug_locks.h~a
> > +++ a/include/linux/debug_locks.h
> > @@ -17,7 +17,7 @@ extern int debug_locks_off(void);
> >  ({                                                                 \
> >     int __ret = 0;                                                  \
> >                                                                     \
> > -   if (unlikely(c)) {                                              \
> > +   if (!oops_in_progress && unlikely(c)) {                         \
> >             if (debug_locks_off() && !debug_locks_silent)           \
> >                     WARN_ON(1);                                     \
> >             __ret = 1;                                              \
> > _
> > 
> > might prevent the debugging code from preventing us from finding bugs :(
> 
> Do you want me to give that patch a try or sit tight for a bit?

It's be good if you can try it please, see if we can get a cleaner
trace.
Comment 6 Thomas Gleixner 2008-09-12 00:43:37 UTC
> It's be good if you can try it please, see if we can get a cleaner
> trace.

We also get a cleaner trace, when we switch off softlockup_panic. 

But:

[    0.003333] hpet clockevent registered
[    0.056670] using C1E aware idle routine

Please update to 2.6.27-rc6. We fixed a couple of nasty bugs in that
area after rc4. The bugs result in hard locks or softlockup watchdog
hits.

Thanks,

	tglx
Comment 7 Ingo Molnar 2008-09-12 02:13:27 UTC
* Andrew Morton <akpm@linux-foundation.org> wrote:

> I can't work out who called panic(), nor why.
> 
> The panic code called the kexec code which called mutex_trylock() 
> which called spin_lock_mutex() which then stupidly went and blurted a 
> load of debug stuff because of in_interrupt().

agreed - applied your fix in the form below to tip/master - thanks 
Andrew.

J, you might want to try tip/master, it includes all known fixes for 
this area and this debug improvement as well. You can pick it up via:

  http://people.redhat.com/mingo/tip.git/README

	Ingo

---------->
From 53b9d87f41a3d8838210ad7cdef02d814817ce85 Mon Sep 17 00:00:00 2001
From: Andrew Morton <akpm@linux-foundation.org>
Date: Thu, 11 Sep 2008 17:02:58 -0700
Subject: [PATCH] lock debug: sit tight when we are already in a panic

in:

  > http://bugzilla.kernel.org/show_bug.cgi?id=11543

The panic code called the kexec code which called mutex_trylock() which
called spin_lock_mutex() which then stupidly went and blurted a load of
debug stuff because of in_interrupt().

Keep the lock debug code from escallating an already crappy situation.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/debug_locks.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index 4aaa4af..096476f 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -17,7 +17,7 @@ extern int debug_locks_off(void);
 ({									\
 	int __ret = 0;							\
 									\
-	if (unlikely(c)) {						\
+	if (!oops_in_progress && unlikely(c)) {				\
 		if (debug_locks_off() && !debug_locks_silent)		\
 			WARN_ON(1);					\
 		__ret = 1;						\
Comment 8 Joshua Hoblitt 2008-09-12 02:58:32 UTC
> ------- Comment #6 from tglx@linutronix.de  2008-09-12 00:43 -------
> > It's be good if you can try it please, see if we can get a cleaner
> > trace.
> 
> We also get a cleaner trace, when we switch off softlockup_panic. 
> 
> But:
> 
> [    0.003333] hpet clockevent registered
> [    0.056670] using C1E aware idle routine
> 
> Please update to 2.6.27-rc6. We fixed a couple of nasty bugs in that
> area after rc4. The bugs result in hard locks or softlockup watchdog
> hits.

Does -rc5 include the patches?  I'm using the netdev tree, trying to fix
another problem, and it hasn't pulled -rc6 yet.

-J

--
Comment 9 Thomas Gleixner 2008-09-12 03:49:53 UTC
> Does -rc5 include the patches?  I'm using the netdev tree, trying to fix
> another problem, and it hasn't pulled -rc6 yet.

No. But there is a combopatch against rc5 at:

http://bugzilla.kernel.org/attachment.cgi?id=17644
Comment 10 Anonymous Emailer 2008-09-12 17:13:20 UTC
Reply-To: josh@hoblitt.com

I just rolled out -rc5 from netdev + Andrew's debug patch + the HPET
patch Thomas pointed me at.  I'll let it roast on these 14 machine is
production over the weekend to see if we get another panic.

I'm attaching the dmesg from this kernel.  We're still getting the NMI
watchdog warning and the rtc is [still] hosed (I think it was last
working around -rc3).

-J

--
On Fri, Sep 12, 2008 at 11:13:08AM +0200, Ingo Molnar wrote:
> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > I can't work out who called panic(), nor why.
> > 
> > The panic code called the kexec code which called mutex_trylock() 
> > which called spin_lock_mutex() which then stupidly went and blurted a 
> > load of debug stuff because of in_interrupt().
> 
> agreed - applied your fix in the form below to tip/master - thanks 
> Andrew.
> 
> J, you might want to try tip/master, it includes all known fixes for 
> this area and this debug improvement as well. You can pick it up via:
> 
>   http://people.redhat.com/mingo/tip.git/README
> 
>       Ingo
> 
> ---------->
> >From 53b9d87f41a3d8838210ad7cdef02d814817ce85 Mon Sep 17 00:00:00 2001
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Thu, 11 Sep 2008 17:02:58 -0700
> Subject: [PATCH] lock debug: sit tight when we are already in a panic
> 
> in:
> 
>   > http://bugzilla.kernel.org/show_bug.cgi?id=11543
> 
> The panic code called the kexec code which called mutex_trylock() which
> called spin_lock_mutex() which then stupidly went and blurted a load of
> debug stuff because of in_interrupt().
> 
> Keep the lock debug code from escallating an already crappy situation.
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  include/linux/debug_locks.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
> index 4aaa4af..096476f 100644
> --- a/include/linux/debug_locks.h
> +++ b/include/linux/debug_locks.h
> @@ -17,7 +17,7 @@ extern int debug_locks_off(void);
>  ({                                                                   \
>       int __ret = 0;                                                  \
>                                                                       \
> -     if (unlikely(c)) {                                              \
> +     if (!oops_in_progress && unlikely(c)) {                         \
>               if (debug_locks_off() && !debug_locks_silent)           \
>                       WARN_ON(1);                                     \
>               __ret = 1;                                              \
Comment 11 Anonymous Emailer 2008-09-15 14:06:51 UTC
Reply-To: josh@hoblitt.com

Since Friday 2 different machines have experienced crashes.  One was a total
deadlock with no response on the console.  The other one reported the trace
below on the console and stopped responding to ssh but I was able to loging via
the serial console and reboot the system.  This particular system has had a
number of "odd" kernel traces over the last year and I'm starting to actually
wonder if it may have a bad DIMM in it as occasionally the failure mode seems
to be different then the deadlocks/etc. we see in the other 15 nodes with
identical hardware.

[30712.654542] general protection fault: 0000 [1] SMP 
<Sep/12 09:25 pm>[30712.657678] CPU 3 
<Sep/12 09:25 pm>[30712.657678] Modules linked in: w83627hf hwmon_vid autofs4 smsc37b787_wdt k8temp i2c_nforce2 i2c_core forcedeth tg3 libphy e1000 xfs dm_snapshot dm_mirror dm_log aacraid 3w_9xxx 3w_xxxx atp870u arcmsr aic7xxx scsi_wait_scan
<Sep/12 09:25 pm>[30712.657678] Pid: 1178, comm: rpciod/3 Not tainted 2.6.27-rc5-22033-gd26acd9-dirty #2
<Sep/12 09:25 pm>[30712.657678] RIP: 0010:[<ffffffff805ac57d>]  [<ffffffff805ac57d>] rpc_count_iostats+0x35/0xb8
<Sep/12 09:25 pm>[30712.657678] RSP: 0018:ffff88012e5d1e08  EFLAGS: 00010206
<Sep/12 09:25 pm>[30712.657678] RAX: ffffffff807adb48 RBX: ffff880126d61088 RCX: 0400000000000000
<Sep/12 09:25 pm>[30712.657678] RDX: ffff88022bcc0380 RSI: ffff88022bcc0000 RDI: ffff880126d61088
<Sep/12 09:25 pm>[30712.657678] RBP: ffff88022bc88000 R08: 0000000000000003 R09: ffff88022e038f10
<Sep/12 09:25 pm>[30712.657678] R10: 0000000000000001 R11: ffff88012e4a0048 R12: 0400000000000000
<Sep/12 09:25 pm>[30712.657678] R13: ffffffff8059f7a8 R14: ffff88022bc88610 R15: 0000000000000000
<Sep/12 09:25 pm>[30712.657678] FS:  00007f006fb306f0(0000) GS:ffff88022fa0d780(0000) knlGS:0000000000000000

<Sep/12 09:25 pm>[30712.657678] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<Sep/12 09:25 pm>[30712.657678] CR2: 00000000011e3018 CR3: 00000001f3527000 CR4:<Sep/12 09:25 pm>
                 <Sep/12 09:25 pm>[30712.657678] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<Sep/12 09:25 pm>[30712.657678] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<Sep/12 09:25 pm>[30712.657678] Process rpciod/3 (pid: 1178, threadinfo ffff88012e5d0000, task ffff88012e5acb50)
<Sep/12 09:25 pm>[30712.657678] Stack:  ffffffff8059b996 ffff880126d61088 00000000fffffff5 ffff880126d61118
<Sep/12 09:25 pm>[30712.657678]  ffffffff8059f7a8 ffffffff808a4b20 ffffffff80599aa1 ffffffff8059f7a8
<Sep/12 09:25 pm>[30712.657678]  0000000000000000 ffff880126d61088 ffffffff8059f5f3 ffff88012e5d1e90
<Sep/12 09:25 pm>[30712.657678] Call Trace:
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff8059b996>] ? xprt_release+0x2c/0x172
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff8059f7a8>] ? rpc_async_schedule+0x0/0xc
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff80599aa1>] ? call_reserveresult+0x95/0xd1
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff8059f7a8>] ? rpc_async_schedule+0x0/0xc
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff8059f5f3>] ? __rpc_execute+0x73/0x228
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff80244394>] ? run_workqueue+0xed/0x1ed
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff8024433e>] ? run_workqueue+0x97/0x1ed
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff80244ef7>] ? worker_thread+0xd8/0xe3
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff8024791a>] ? autoremove_wake_function+0x0/0x2e
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff80244e1f>] ? worker_thread+0x0/0xe3
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff80247805>] ? kthread+0x47/0x76
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff80230223>] ? schedule_tail+0x27/0x5f
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff8020ce09>] ? child_rip+0xa/0x11

<Sep/12 09:25 pm>[30712.657678]  [<ffffffff8024767c>] ? kthreadd+0x167/0x18c
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff802477be>] ? kthread+0x0/0x76
<Sep/12 09:25 pm>[30712.657678]  [<ffffffff8020cdff>] ? child_rip+0x0/0x11
<Sep/12 09:25 pm>[30712.657678] 
<Sep/12 09:25 pm>[30712.657678] 
<Sep/12 09:25 pm>[30712.657678] Code: b0 98 00 00 00 48 85 f6 0f 94 c2 48 85 c9 0f 94 c0 08 c2 0f 85 94 00 00 00 48 8b 47 38 8b 50 28 48 c1 e2 07 48 8d 14 16 48 ff 02 <48> 63 81 50 01 00 00 48 01 42 08 0f b7 87 e0 00 00 00 48 01 42 
<Sep/12 09:25 pm>[30712.657678] RIP  [<ffffffff805ac57d>] rpc_count_iostats+0x35/0xb8
<Sep/12 09:25 pm>[30712.657678]  RSP <ffff88012e5d1e08>
<Sep/12 09:25 pm>[30712.951578] ---[ end trace e646bef2de7f0b20 ]---

-J

--
On Fri, Sep 12, 2008 at 02:13:06PM -1000, Joshua Hoblitt wrote:
> I just rolled out -rc5 from netdev + Andrew's debug patch + the HPET
> patch Thomas pointed me at.  I'll let it roast on these 14 machine is
> production over the weekend to see if we get another panic.
> 
> I'm attaching the dmesg from this kernel.  We're still getting the NMI
> watchdog warning and the rtc is [still] hosed (I think it was last
> working around -rc3).
> 
> -J
> 
> --
> On Fri, Sep 12, 2008 at 11:13:08AM +0200, Ingo Molnar wrote:
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > I can't work out who called panic(), nor why.
> > > 
> > > The panic code called the kexec code which called mutex_trylock() 
> > > which called spin_lock_mutex() which then stupidly went and blurted a 
> > > load of debug stuff because of in_interrupt().
> > 
> > agreed - applied your fix in the form below to tip/master - thanks 
> > Andrew.
> > 
> > J, you might want to try tip/master, it includes all known fixes for 
> > this area and this debug improvement as well. You can pick it up via:
> > 
> >   http://people.redhat.com/mingo/tip.git/README
> > 
> >     Ingo
> > 
> > ---------->
> > >From 53b9d87f41a3d8838210ad7cdef02d814817ce85 Mon Sep 17 00:00:00 2001
> > From: Andrew Morton <akpm@linux-foundation.org>
> > Date: Thu, 11 Sep 2008 17:02:58 -0700
> > Subject: [PATCH] lock debug: sit tight when we are already in a panic
> > 
> > in:
> > 
> >   > http://bugzilla.kernel.org/show_bug.cgi?id=11543
> > 
> > The panic code called the kexec code which called mutex_trylock() which
> > called spin_lock_mutex() which then stupidly went and blurted a load of
> > debug stuff because of in_interrupt().
> > 
> > Keep the lock debug code from escallating an already crappy situation.
> > 
> > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > ---
> >  include/linux/debug_locks.h |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
> > index 4aaa4af..096476f 100644
> > --- a/include/linux/debug_locks.h
> > +++ b/include/linux/debug_locks.h
> > @@ -17,7 +17,7 @@ extern int debug_locks_off(void);
> >  ({                                                                 \
> >     int __ret = 0;                                                  \
> >                                                                     \
> > -   if (unlikely(c)) {                                              \
> > +   if (!oops_in_progress && unlikely(c)) {                         \
> >             if (debug_locks_off() && !debug_locks_silent)           \
> >                     WARN_ON(1);                                     \
> >             __ret = 1;                                              \

> [    0.000000] Initializing cgroup subsys cpuset
> [    0.000000] Linux version 2.6.27-rc4-21704-gd25e26b (root@ipp000) (gcc
> version 4.1.2 (Gentoo 4.1.2 p1.1)) #1 SMP Tue Sep 2 16:05:45 HST 2008
> [    0.000000] Command line: root=/dev/ram0 real_root=/dev/sda3 init=/linuxrc
> nmi_watchdog=1 show_msr=1 console=tty0 console=ttyS0,115200n8
> [    0.000000] KERNEL supported cpus:
> [    0.000000]   Intel GenuineIntel
> [    0.000000]   AMD AuthenticAMD
> [    0.000000]   Centaur CentaurHauls
> [    0.000000] BIOS-provided physical RAM map:
> [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009dc00 (usable)
> [    0.000000]  BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
> [    0.000000]  BIOS-e820: 00000000000ce000 - 0000000000100000 (reserved)
> [    0.000000]  BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)
> [    0.000000]  BIOS-e820: 00000000cff60000 - 00000000cff6a000 (ACPI data)
> [    0.000000]  BIOS-e820: 00000000cff6a000 - 00000000cff80000 (ACPI NVS)
> [    0.000000]  BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
> [    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
> [    0.000000]  BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
> [    0.000000]  BIOS-e820: 0000000100000000 - 0000000230000000 (usable)
> [    0.000000] last_pfn = 0x230000 max_arch_pfn = 0x3ffffffff
> [    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new
> 0x7010600070106
> [    0.000000] last_pfn = 0xcff60 max_arch_pfn = 0x3ffffffff
> [    0.000000] init_memory_mapping
> [    0.000000]  0000000000 - 00cfe00000 page 2M
> [    0.000000]  00cfe00000 - 00cff60000 page 4k
> [    0.000000] kernel direct mapping tables up to cff60000 @ 8000-e000
> [    0.000000] last_map_addr: cff60000 end: cff60000
> [    0.000000] init_memory_mapping
> [    0.000000]  0100000000 - 0230000000 page 2M
> [    0.000000] kernel direct mapping tables up to 230000000 @ c000-16000
> [    0.000000] last_map_addr: 230000000 end: 230000000
> [    0.000000] RAMDISK: 37e0f000 - 37fef7dc
> [    0.000000] DMI present.
> [    0.000000] ACPI: RSDP 000F75D0, 0024 (r2 PTLTD )
> [    0.000000] ACPI: XSDT CFF656FA, 0064 (r1 PTLTD     XSDT    6040000  LTP  
>      0)
> [    0.000000] ACPI: FACP CFF6970E, 00F4 (r3 NVIDIA CK8S      6040000 PTL_   
> F4240)
> [    0.000000] ACPI: DSDT CFF6575E, 3F3C (r1 NVIDIA      CK8  6040000 MSFT 
> 3000000)
> [    0.000000] ACPI: FACS CFF6AFC0, 0040
> [    0.000000] ACPI: SSDT CFF69802, 0574 (r1 AMD    POWERNOW  6040000 AMD    
>     1)
> [    0.000000] ACPI: SRAT CFF69D76, 0110 (r1 AMD    HAMMER    6040000 AMD    
>     1)
> [    0.000000] ACPI: SPCR CFF69E86, 0050 (r1 PTLTD  $UCRTBL$  6040000 PTL    
>     1)
> [    0.000000] ACPI: MCFG CFF69ED6, 003C (r1 PTLTD    MCFG    6040000  LTP   
>     0)
> [    0.000000] ACPI: HPET CFF69F12, 0038 (r1 PTLTD  HPETTBL   6040000  LTP   
>     1)
> [    0.000000] ACPI: APIC CFF69F4A, 008E (r1 PTLTD     APIC    6040000  LTP  
>      0)
> [    0.000000] ACPI: BOOT CFF69FD8, 0028 (r1 PTLTD  $SBFTBL$  6040000  LTP   
>     1)
> [    0.000000] SRAT: PXM 0 -> APIC 0 -> Node 0
> [    0.000000] SRAT: PXM 0 -> APIC 1 -> Node 0
> [    0.000000] SRAT: PXM 1 -> APIC 2 -> Node 1
> [    0.000000] SRAT: PXM 1 -> APIC 3 -> Node 1
> [    0.000000] SRAT: Node 0 PXM 0 0-a0000
> [    0.000000] SRAT: Node 0 PXM 0 100000-d0000000
> [    0.000000] SRAT: Node 0 PXM 0 100000000-130000000
> [    0.000000] SRAT: Node 1 PXM 1 130000000-230000000
> [    0.000000] NUMA: Allocated memnodemap from 11000 - 15680
> [    0.000000] NUMA: Using 20 for the hash shift.
> [    0.000000] Bootmem setup node 0 0000000000000000-0000000130000000
> [    0.000000]   NODE_DATA [0000000000015680 - 000000000002067f]
> [    0.000000]   bootmap [0000000000021000 -  0000000000046fff] pages 26
> [    0.000000] (8 early reservations) ==> bootmem [0000000000 - 0130000000]
> [    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==>
> [0000000000 - 0000001000]
> [    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE ==>
> [0000006000 - 0000008000]
> [    0.000000]   #2 [0000200000 - 0000f96890]    TEXT DATA BSS ==>
> [0000200000 - 0000f96890]
> [    0.000000]   #3 [0037e0f000 - 0037fef7dc]          RAMDISK ==>
> [0037e0f000 - 0037fef7dc]
> [    0.000000]   #4 [000009dc00 - 0000100000]    BIOS reserved ==>
> [000009dc00 - 0000100000]
> [    0.000000]   #5 [0000008000 - 000000c000]          PGTABLE ==>
> [0000008000 - 000000c000]
> [    0.000000]   #6 [000000c000 - 0000011000]          PGTABLE ==>
> [000000c000 - 0000011000]
> [    0.000000]   #7 [0000011000 - 0000015680]       MEMNODEMAP ==>
> [0000011000 - 0000015680]
> [    0.000000] Bootmem setup node 1 0000000130000000-0000000230000000
> [    0.000000]   NODE_DATA [0000000130000000 - 000000013000afff]
> [    0.000000]   bootmap [000000013000b000 -  000000013002afff] pages 20
> [    0.000000] (8 early reservations) ==> bootmem [0130000000 - 0230000000]
> [    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page
> [    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE
> [    0.000000]   #2 [0000200000 - 0000f96890]    TEXT DATA BSS
> [    0.000000]   #3 [0037e0f000 - 0037fef7dc]          RAMDISK
> [    0.000000]   #4 [000009dc00 - 0000100000]    BIOS reserved
> [    0.000000]   #5 [0000008000 - 000000c000]          PGTABLE
> [    0.000000]   #6 [000000c000 - 0000011000]          PGTABLE
> [    0.000000]   #7 [0000011000 - 0000015680]       MEMNODEMAP
> [    0.000000] found SMP MP-table at [ffff8800000f7600] 000f7600
> [    0.000000]  [ffffe20000000000-ffffe200071fffff] PMD ->
> [ffff880028200000-ffff88002e1fffff] on node 0
> [    0.000000]  [ffffe20007200000-ffffe2000d1fffff] PMD ->
> [ffff880130200000-ffff8801361fffff] on node 1
> [    0.000000] Zone PFN ranges:
> [    0.000000]   DMA      0x00000000 -> 0x00001000
> [    0.000000]   DMA32    0x00001000 -> 0x00100000
> [    0.000000]   Normal   0x00100000 -> 0x00230000
> [    0.000000] Movable zone start PFN for each node
> [    0.000000] early_node_map[4] active PFN ranges
> [    0.000000]     0: 0x00000000 -> 0x0000009d
> [    0.000000]     0: 0x00000100 -> 0x000cff60
> [    0.000000]     0: 0x00100000 -> 0x00130000
> [    0.000000]     1: 0x00130000 -> 0x00230000
> [    0.000000] On node 0 totalpages: 1048317
> [    0.000000]   DMA zone: 307 pages, LIFO batch:0
> [    0.000000]   DMA32 zone: 823232 pages, LIFO batch:31
> [    0.000000]   Normal zone: 192000 pages, LIFO batch:31
> [    0.000000] On node 1 totalpages: 1048576
> [    0.000000]   Normal zone: 1024000 pages, LIFO batch:31
> [    0.000000] Detected use of extended apic ids on hypertransport bus
> [    0.000000] Detected use of extended apic ids on hypertransport bus
> [    0.000000] ACPI: PM-Timer IO Port: 0x1008
> [    0.000000] ACPI: Local APIC address 0xfee00000
> [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
> [    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
> [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
> [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
> [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
> [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
> [    0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
> [    0.000000] IOAPIC[0]: apic_id 4, version 0, address 0xfec00000, GSI 0-23
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [    0.000000] ACPI: IRQ0 used by override.
> [    0.000000] ACPI: IRQ2 used by override.
> [    0.000000] ACPI: IRQ9 used by override.
> [    0.000000] Setting APIC routing to flat
> [    0.000000] ACPI: HPET id: 0x10de8201 base: 0xfed00000
> [    0.000000] Using ACPI (MADT) for SMP configuration information
> [    0.000000] SMP: Allowing 4 CPUs, 0 hotplug CPUs
> [    0.000000] PM: Registered nosave memory: 000000000009d000 -
> 000000000009e000
> [    0.000000] PM: Registered nosave memory: 000000000009e000 -
> 00000000000a0000
> [    0.000000] PM: Registered nosave memory: 00000000000a0000 -
> 00000000000ce000
> [    0.000000] PM: Registered nosave memory: 00000000000ce000 -
> 0000000000100000
> [    0.000000] PM: Registered nosave memory: 00000000cff60000 -
> 00000000cff6a000
> [    0.000000] PM: Registered nosave memory: 00000000cff6a000 -
> 00000000cff80000
> [    0.000000] PM: Registered nosave memory: 00000000cff80000 -
> 00000000d0000000
> [    0.000000] PM: Registered nosave memory: 00000000d0000000 -
> 00000000e0000000
> [    0.000000] PM: Registered nosave memory: 00000000e0000000 -
> 00000000f0000000
> [    0.000000] PM: Registered nosave memory: 00000000f0000000 -
> 00000000fec00000
> [    0.000000] PM: Registered nosave memory: 00000000fec00000 -
> 00000000fec10000
> [    0.000000] PM: Registered nosave memory: 00000000fec10000 -
> 00000000fee00000
> [    0.000000] PM: Registered nosave memory: 00000000fee00000 -
> 00000000fee01000
> [    0.000000] PM: Registered nosave memory: 00000000fee01000 -
> 00000000fff80000
> [    0.000000] PM: Registered nosave memory: 00000000fff80000 -
> 0000000100000000
> [    0.000000] Allocating PCI resources starting at d1000000 (gap:
> d0000000:10000000)
> [    0.000000] PERCPU: Allocating 51872 bytes of per cpu data
> [    0.000000] NR_CPUS: 32, nr_cpu_ids: 4, nr_node_ids 2
> [    0.000000] Built 2 zonelists in Node order, mobility grouping on.  Total
> pages: 2039539
> [    0.000000] Policy zone: Normal
> [    0.000000] Kernel command line: root=/dev/ram0 real_root=/dev/sda3
> init=/linuxrc nmi_watchdog=1 show_msr=1 console=tty0 console=ttyS0,115200n8
> [    0.000000] Initializing CPU#0
> [    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
> [    0.000000] Extended CMOS year: 2000
> [    0.000000] TSC calibrated against PM_TIMER
> [    0.000000] Detected 2800.009 MHz processor.
> [    0.003333] spurious 8259A interrupt: IRQ7.
> [    0.003333] Console: colour VGA+ 80x25
> [    0.003333] console [tty0] enabled
> [    0.003333] console [ttyS0] enabled
> [    0.003333] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc.,
> Ingo Molnar
> [    0.003333] ... MAX_LOCKDEP_SUBCLASSES:    8
> [    0.003333] ... MAX_LOCK_DEPTH:          48
> [    0.003333] ... MAX_LOCKDEP_KEYS:        8191
> [    0.003333] ... CLASSHASH_SIZE:           4096
> [    0.003333] ... MAX_LOCKDEP_ENTRIES:     8192
> [    0.003333] ... MAX_LOCKDEP_CHAINS:      16384
> [    0.003333] ... CHAINHASH_SIZE:          8192
> [    0.003333]  memory used by lock dependency info: 3839 kB
> [    0.003333]  per task-struct memory footprint: 1920 bytes
> [    0.003333] Checking aperture...
> [    0.003333] No AGP bridge found
> [    0.003333] Node 0: aperture @ 0 size 32 MB
> [    0.003333] Your BIOS doesn't leave a aperture memory hole
> [    0.003333] Please enable the IOMMU option in the BIOS setup
> [    0.003333] This costs you 64 MB of RAM
> [    0.003333] Mapping aperture over 65536 KB of RAM @ 20000000
> [    0.003333] PM: Registered nosave memory: 0000000020000000 -
> 0000000024000000
> [    0.003333] Memory: 8108304k/9175040k available (3896k kernel code,
> 279268k reserved, 2243k data, 676k init)
> [    0.003333] CPA: page pool initialized 1 of 1 pages preallocated
> [    0.003333] SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0,
> CPUs=4, Nodes=2
> [    0.003333] hpet clockevent registered
> [    0.003333] Calibrating delay loop (skipped), value calculated using timer
> frequency.. 5602.35 BogoMIPS (lpj=9333363)
> [    0.006698] Security Framework initialized
> [    0.010737] Dentry cache hash table entries: 1048576 (order: 11, 8388608
> bytes)
> [    0.018937] Inode-cache hash table entries: 524288 (order: 10, 4194304
> bytes)
> [    0.025421] Mount-cache hash table entries: 256
> [    0.026974] Initializing cgroup subsys ns
> [    0.030010] Initializing cgroup subsys cpuacct
> [    0.033335] Initializing cgroup subsys memory
> [    0.036674] Initializing cgroup subsys devices
> [    0.040006] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64
> bytes/line)
> [    0.043332] CPU: L2 Cache: 1024K (64 bytes/line)
> [    0.046666] CPU 0/0 -> Node 0
> [    0.049999] tseg: 00cff80000
> [    0.050000] CPU: Physical Processor ID: 0
> [    0.053331] CPU: Processor Core ID: 0
> [    0.056670] using C1E aware idle routine
> [    0.060018] ACPI: Core revision 20080609
> [    0.070530] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [    0.108872] activating NMI Watchdog ... done.
> [    0.109994] CPU0: Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
> [    0.116660] Using local APIC timer interrupts.
> [    0.123326] APIC timer calibration result 12500076
> [    0.123328] Detected 12.500 MHz APIC timer.
> [    0.126660] APIC timer registered as dummy, due to nmi_watchdog=1!
> [    0.130154] lockdep: fixing up alternatives.
> [    0.133347] Booting processor 1/1 ip 6000
> [    0.003333] Initializing CPU#1
> [    0.003333] Calibrating delay using timer specific routine.. 5602.36
> BogoMIPS (lpj=9333385)
> [    0.003333] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64
> bytes/line)
> [    0.003333] CPU: L2 Cache: 1024K (64 bytes/line)
> [    0.003333] CPU 1/1 -> Node 0
> [    0.003333] CPU: Physical Processor ID: 0
> [    0.003333] CPU: Processor Core ID: 1
> [    0.003333] x86 PAT enabled: cpu 1, old 0x7040600070406, new
> 0x7010600070106
> [    0.230372] CPU1: Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
> [    0.238145] lockdep: fixing up alternatives.
> [    0.239999] Booting processor 2/2 ip 6000
> [    0.003333] Initializing CPU#2
> [    0.003333] Calibrating delay using timer specific routine.. 5602.44
> BogoMIPS (lpj=9333513)
> [    0.003333] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64
> bytes/line)
> [    0.003333] CPU: L2 Cache: 1024K (64 bytes/line)
> [    0.003333] CPU 2/2 -> Node 1
> [    0.003333] CPU: Physical Processor ID: 1
> [    0.003333] CPU: Processor Core ID: 0
> [    0.003333] x86 PAT enabled: cpu 2, old 0x7040600070406, new
> 0x7010600070106
> [    0.340530] CPU2: Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
> [    0.348138] lockdep: fixing up alternatives.
> [    0.349997] Booting processor 3/3 ip 6000
> [    0.003333] Initializing CPU#3
> [    0.003333] Calibrating delay using timer specific routine.. 5602.48
> BogoMIPS (lpj=9333575)
> [    0.003333] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64
> bytes/line)
> [    0.003333] CPU: L2 Cache: 1024K (64 bytes/line)
> [    0.003333] CPU 3/3 -> Node 1
> [    0.003333] CPU: Physical Processor ID: 1
> [    0.003333] CPU: Processor Core ID: 1
> [    0.003333] x86 PAT enabled: cpu 3, old 0x7040600070406, new
> 0x7010600070106
> [    0.447118] CPU3: Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
> [    0.454734] Brought up 4 CPUs
> [    0.456640] Total of 4 processors activated (22409.64 BogoMIPS).
> [    0.460033] Testing NMI watchdog ... 
> [    0.532432] WARNING: CPU#0: NMI appears to be stuck (0->0)!
> [    0.533301] Please report this to bugzilla.kernel.org,
> [    0.536635] and attach the output of the 'dmesg' command.
> [    0.539968] 
> [    0.543302] WARNING: CPU#1: NMI appears to be stuck (0->0)!
> [    0.546635] Please report this to bugzilla.kernel.org,
> [    0.549968] and attach the output of the 'dmesg' command.
> [    0.553301] 
> [    0.554980] WARNING: CPU#2: NMI appears to be stuck (0->0)!
> [    0.556635] Please report this to bugzilla.kernel.org,
> [    0.559968] and attach the output of the 'dmesg' command.
> [    0.563301] 
> [    0.566635] WARNING: CPU#3: NMI appears to be stuck (0->0)!
> [    0.569968] Please report this to bugzilla.kernel.org,
> [    0.573301] and attach the output of the 'dmesg' command.
> [    0.577183] net_namespace: 968 bytes
> [    0.580160] xor: automatically using best checksumming function:
> generic_sse
> [    0.599969]    generic_sse:  8630.400 MB/sec
> [    0.603302] xor: using function: generic_sse (8630.400 MB/sec)
> [    0.606692] NET: Registered protocol family 16
> [    0.610254] No dock devices found.
> [    0.613503] node 0 link 2: io port [1000, 3fff]
> [    0.613506] TOM: 00000000d0000000 aka 3328M
> [    0.616636] node 0 link 2: mmio [d0000000, dfffffff]
> [    0.616638] node 0 link 2: mmio [a0000, bffff]
> [    0.616640] node 0 link 2: mmio [e0000000, e7ffffff]
> [    0.616643] TOM2: 0000000230000000 aka 8960M
> [    0.619969] bus: [00,ff] on node 0 link 2
> [    0.619970] bus: 00 index 0 io port: [0, ffff]
> [    0.619972] bus: 00 index 1 mmio: [d0000000, ffffffff]
> [    0.619974] bus: 00 index 2 mmio: [a0000, bffff]
> [    0.619975] bus: 00 index 3 mmio: [230000000, fcffffffff]
> [    0.619989] ACPI: bus type pci registered
> [    0.623448] PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 4
> [    0.626636] PCI: MCFG area at e0000000 reserved in E820
> [    0.630065] PCI: Using MMCONFIG at e0000000 - e04fffff
> [    0.633302] PCI: Using configuration type 1 for base access
> [    0.643980] ACPI: EC: Look up EC in DSDT
> [    0.648409] ACPI: Interpreter enabled
> [    0.649969] ACPI: (supports S0 S1 S3 S4 S5)
> [    0.656635] ACPI: Using IOAPIC for interrupt routing
> [    0.670034] ACPI: PCI Root Bridge [PCI0] (0000:00)
> [    0.673616] pci 0000:00:01.1: PME# supported from D3hot D3cold
> [    0.676644] pci 0000:00:01.1: PME# disabled
> [    0.680009] pci 0000:00:02.0: supports D1
> [    0.680010] pci 0000:00:02.0: supports D2
> [    0.680012] pci 0000:00:02.0: PME# supported from D0 D1 D2 D3hot D3cold
> [    0.683303] pci 0000:00:02.0: PME# disabled
> [    0.686672] pci 0000:00:02.1: supports D1
> [    0.686673] pci 0000:00:02.1: supports D2
> [    0.686675] pci 0000:00:02.1: PME# supported from D0 D1 D2 D3hot D3cold
> [    0.689969] pci 0000:00:02.1: PME# disabled
> [    0.693535] pci 0000:00:08.0: supports D1
> [    0.693537] pci 0000:00:08.0: supports D2
> [    0.693538] pci 0000:00:08.0: PME# supported from D0 D1 D2 D3hot D3cold
> [    0.696637] pci 0000:00:08.0: PME# disabled
> [    0.700017] pci 0000:00:09.0: supports D1
> [    0.700019] pci 0000:00:09.0: supports D2
> [    0.700020] pci 0000:00:09.0: PME# supported from D0 D1 D2 D3hot D3cold
> [    0.703304] pci 0000:00:09.0: PME# disabled
> [    0.706664] pci 0000:00:0a.0: PME# supported from D0 D1 D2 D3hot D3cold
> [    0.709969] pci 0000:00:0a.0: PME# disabled
> [    0.713331] pci 0000:00:0d.0: PME# supported from D0 D1 D2 D3hot D3cold
> [    0.716636] pci 0000:00:0d.0: PME# disabled
> [    0.719997] pci 0000:00:0f.0: PME# supported from D0 D1 D2 D3hot D3cold
> [    0.723302] pci 0000:00:0f.0: PME# disabled
> [    0.726878] pci 0000:01:04.0: supports D1
> [    0.726879] pci 0000:01:04.0: supports D2
> [    0.726904] pci 0000:00:06.0: transparent bridge
> [    0.729969] PCI: bridge 0000:00:06.0 io port: [3000, 3fff]
> [    0.733303] PCI: bridge 0000:00:06.0 32bit mmio: [d0100000, d01fffff]
> [    0.736636] PCI: bridge 0000:00:06.0 32bit mmio pref: [d8000000, dfffffff]
> [    0.740034] pci 0000:02:00.0: supports D1
> [    0.740064] PCI: bridge 0000:00:0a.0 32bit mmio: [d0200000, d02fffff]
> [    0.743388] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
> [    0.743462] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P2P0._PRT]
> [    0.743486] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.XVR0._PRT]
> [    0.743513] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.XVR2._PRT]
> [    0.743540] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.XVR5._PRT]
> [    0.769795] ACPI: PCI Interrupt Link [LNK1] (IRQs 5 7 10 11 14 15 16 17 18
> 19 20 21 22 23) *0, disabled.
> [    0.777165] ACPI: PCI Interrupt Link [LNK2] (IRQs 5 7 *10 11 14 15 16 17
> 18 19 20 21 22 23)
> [    0.786790] ACPI: PCI Interrupt Link [LNK3] (IRQs 5 7 10 11 14 15 16 17 18
> 19 20 21 22 23) *0, disabled.
> [    0.797164] ACPI: PCI Interrupt Link [LNK4] (IRQs 5 7 10 11 14 15 16 17 18
> 19 20 21 22 23) *0, disabled.
> [    0.808626] ACPI: PCI Interrupt Link [LK1E] (IRQs 5 7 10 11 14 15 16 17 18
> 19 20 21 22 23) *0, disabled.
> [    0.816220] ACPI: PCI Interrupt Link [LK2E] (IRQs 5 7 10 *11 14 15 16 17
> 18 19 20 21 22 23)
> [    0.822660] ACPI: PCI Interrupt Link [LK3E] (IRQs 5 7 10 11 14 15 16 17 18
> 19 20 21 22 23) *0, disabled.
> [    0.830122] ACPI: PCI Interrupt Link [LK4E] (IRQs 5 7 10 11 14 15 16 17 18
> 19 20 21 22 23) *0, disabled.
> [    0.840505] ACPI: PCI Interrupt Link [LSMB] (IRQs 5 7 10 11 14 15 16 17 18
> 19 20 21 22 23) *0, disabled.
> [    0.851950] ACPI: PCI Interrupt Link [LUS0] (IRQs 5 7 *10 11 14 15 16 17
> 18 19 20 21 22 23)
> [    0.858235] ACPI: PCI Interrupt Link [LMA2] (IRQs 5 7 10 *11 14 15 16 17
> 18 19 20 21 22 23)
> [    0.867922] ACPI: PCI Interrupt Link [LUS2] (IRQs *5 7 10 11 14 15 16 17
> 18 19 20 21 22 23)
> [    0.877614] ACPI: PCI Interrupt Link [LMAC] (IRQs 5 7 *10 11 14 15 16 17
> 18 19 20 21 22 23)
> [    0.887303] ACPI: PCI Interrupt Link [LAZA] (IRQs 5 7 10 11 14 15 16 17 18
> 19 20 21 22 23) *0, disabled.
> [    0.898166] ACPI: PCI Interrupt Link [LPID] (IRQs 5 7 10 11 14 15 16 17 18
> 19 20 21 22 23) *0, disabled.
> [    0.909241] ACPI: PCI Interrupt Link [LTID] (IRQs 5 7 10 *11 14 15 16 17
> 18 19 20 21 22 23)
> [    0.915682] ACPI: PCI Interrupt Link [LSI1] (IRQs 5 7 *10 11 14 15 16 17
> 18 19 20 21 22 23)
> [    0.921892] ACPI: PCI Interrupt Link [LSI2] (IRQs 5 7 10 *11 14 15 16 17
> 18 19 20 21 22 23)
> [    0.928367] Linux Plug and Play Support v0.97 (c) Adam Belay
> [    0.930030] pnp: PnP ACPI init
> [    0.933313] ACPI: bus type pnp registered
> [    0.942967] pnp: PnP ACPI: found 15 devices
> [    0.943302] ACPI: ACPI bus type pnp unregistered
> [    0.946899] SCSI subsystem initialized
> [    0.950132] libata version 3.00 loaded.
> [    0.953483] usbcore: registered new interface driver usbfs
> [    0.956699] usbcore: registered new interface driver hub
> [    0.960060] usbcore: registered new device driver usb
> [    0.963671] PCI: Using ACPI for IRQ routing
> [    0.980227] PCI-DMA: Disabling AGP.
> [    0.984431] PCI-DMA: aperture base @ 20000000 size 65536 KB
> [    0.986635] PCI-DMA: using GART IOMMU.
> [    0.989971] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
> [    0.993870] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31
> [    1.000341] hpet0: 3 32-bit timers, 25000000 Hz
> [    1.004395] ACPI: RTC can wake from S4
> [    1.006637] Clockevents: could not switch to one-shot mode:<6>Clockevents:
> could not switch to one-shot mode: lapic is not functional.
> [    1.009524] Could not switch to high resolution mode on CPU 3
> [    1.009528] Clockevents: could not switch to one-shot mode: lapic is not
> functional.
> [    1.009531] Could not switch to high resolution mode on CPU 1
> [    1.009534] Clockevents: could not switch to one-shot mode: lapic is not
> functional.
> [    1.009538] Could not switch to high resolution mode on CPU 2
> [    1.009969]  lapic is not functional.
> [    1.056944] Could not switch to high resolution mode on CPU 0
> [    1.076221] system 00:00: iomem range 0xffc00000-0xffffffff could not be
> reserved
> [    1.084054] system 00:00: iomem range 0xfec00000-0xfec00fff could not be
> reserved
> [    1.091883] system 00:00: iomem range 0xfee00000-0xfeefffff could not be
> reserved
> [    1.099719] system 00:00: iomem range 0xfed00000-0xfed00fff has been
> reserved
> [    1.107052] system 00:03: iomem range 0xd0000000-0xd0007fff has been
> reserved
> [    1.114383] system 00:04: ioport range 0x1000-0x107f has been reserved
> [    1.121101] system 00:04: ioport range 0x1080-0x10ff has been reserved
> [    1.127817] system 00:04: ioport range 0x1400-0x147f has been reserved
> [    1.134537] system 00:04: ioport range 0x1480-0x14ff has been reserved
> [    1.141258] system 00:04: ioport range 0x1800-0x187f has been reserved
> [    1.147976] system 00:04: ioport range 0x1880-0x18ff has been reserved
> [    1.154698] system 00:04: ioport range 0x2440-0x247f has been reserved
> [    1.161420] system 00:04: ioport range 0x2400-0x243f has been reserved
> [    1.168151] system 00:07: ioport range 0x4d0-0x4d1 has been reserved
> [    1.174695] system 00:07: ioport range 0xc05-0xc06 has been reserved
> [    1.182277] pci 0000:00:06.0: PCI bridge, secondary bus 0000:01
> [    1.188388] pci 0000:00:06.0:   IO window: 0x3000-0x3fff
> [    1.193889] pci 0000:00:06.0:   MEM window: 0xd0100000-0xd01fffff
> [    1.200176] pci 0000:00:06.0:   PREFETCH window:
> 0x000000d8000000-0x000000dfffffff
> [    1.208089] pci 0000:00:0a.0: PCI bridge, secondary bus 0000:02
> [    1.214205] pci 0000:00:0a.0:   IO window: disabled
> [    1.219275] pci 0000:00:0a.0:   MEM window: 0xd0200000-0xd02fffff
> [    1.225558] pci 0000:00:0a.0:   PREFETCH window:
> 0x000000d1000000-0x000000d10fffff
> [    1.233474] pci 0000:00:0d.0: PCI bridge, secondary bus 0000:03
> [    1.239598] pci 0000:00:0d.0:   IO window: disabled
> [    1.244666] pci 0000:00:0d.0:   MEM window: disabled
> [    1.249825] pci 0000:00:0d.0:   PREFETCH window: disabled
> [    1.255416] pci 0000:00:0f.0: PCI bridge, secondary bus 0000:04
> [    1.261525] pci 0000:00:0f.0:   IO window: disabled
> [    1.266594] pci 0000:00:0f.0:   MEM window: disabled
> [    1.271751] pci 0000:00:0f.0:   PREFETCH window: disabled
> [    1.277347] pci 0000:00:06.0: setting latency timer to 64
> [    1.277352] pci 0000:00:0a.0: setting latency timer to 64
> [    1.277357] pci 0000:00:0d.0: setting latency timer to 64
> [    1.277363] pci 0000:00:0f.0: setting latency timer to 64
> [    1.277365] bus: 00 index 0 io port: [0, ffff]
> [    1.282004] bus: 00 index 1 mmio: [0, ffffffffffffffff]
> [    1.292287] bus: 01 index 0 io port: [3000, 3fff]
> [    1.297187] bus: 01 index 1 mmio: [d0100000, d01fffff]
> [    1.302519] bus: 01 index 2 mmio: [d8000000, dfffffff]
> [    1.307849] bus: 01 index 3 io port: [0, ffff]
> [    1.312483] bus: 01 index 4 mmio: [0, ffffffffffffffff]
> [    1.317900] bus: 02 index 0 mmio: [0, 0]
> [    1.322017] bus: 02 index 1 mmio: [d0200000, d02fffff]
> [    1.327349] bus: 02 index 2 mmio: [d1000000, d10fffff]
> [    1.332678] bus: 02 index 3 mmio: [0, 0]
> [    1.336793] bus: 03 index 0 mmio: [0, 0]
> [    1.340910] bus: 03 index 1 mmio: [0, 0]
> [    1.345025] bus: 03 index 2 mmio: [0, 0]
> [    1.349142] bus: 03 index 3 mmio: [0, 0]
> [    1.353256] bus: 04 index 0 mmio: [0, 0]
> [    1.357374] bus: 04 index 1 mmio: [0, 0]
> [    1.361490] bus: 04 index 2 mmio: [0, 0]
> [    1.365609] bus: 04 index 3 mmio: [0, 0]
> [    1.369739] NET: Registered protocol family 2
> [    1.413137] IP route cache hash table entries: 262144 (order: 9, 2097152
> bytes)
> [    1.423160] TCP established hash table entries: 524288 (order: 11, 8388608
> bytes)
> [    1.436206] TCP bind hash table entries: 65536 (order: 9, 3670016 bytes)
> [    1.445874] TCP: Hash tables configured (established 524288 bind 65536)
> [    1.452716] TCP reno registered
> [    1.466350] NET: Registered protocol family 1
> [    1.471023] checking if image is initramfs... it is
> [    1.572646] Freeing initrd memory: 1921k freed
> [    1.578336] Simple Boot Flag at 0x37 set to 0x80
> [    1.585905] audit: initializing netlink socket (disabled)
> [    1.591560] type=2000 audit(1221176947.589:1): initialized
> [    1.602577] HugeTLB registered 2 MB page size, pre-allocated 0 pages
> [    1.613210] VFS: Disk quotas dquot_6.5.1
> [    1.617471] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
> [    1.625780] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
> [    1.632809] msgmni has been set to 15840
> [    1.637191] async_tx: api initialized (async)
> [    1.641747] io scheduler noop registered
> [    1.645871] io scheduler anticipatory registered
> [    1.650694] io scheduler deadline registered
> [    1.655347] io scheduler cfq registered (default)
> [    1.660281] pci 0000:00:00.0: Enabling HT MSI Mapping
> [    1.879582] pci 0000:00:05.0: Enabling HT MSI Mapping
> [    1.884851] pci 0000:00:05.1: Enabling HT MSI Mapping
> [    1.890123] pci 0000:00:05.2: Enabling HT MSI Mapping
> [    1.895393] pci 0000:00:06.0: Enabling HT MSI Mapping
> [    1.900669] pci 0000:00:08.0: Enabling HT MSI Mapping
> [    1.905945] pci 0000:00:09.0: Enabling HT MSI Mapping
> [    1.911213] pci 0000:00:0a.0: Enabling HT MSI Mapping
> [    1.916486] pci 0000:00:0d.0: Enabling HT MSI Mapping
> [    1.921761] pci 0000:00:0f.0: Enabling HT MSI Mapping
> [    1.927029] pci 0000:01:04.0: Boot video device
> [    1.927212] pcieport-driver 0000:00:0a.0: setting latency timer to 64
> [    1.927241] pcieport-driver 0000:00:0a.0: found MSI capability
> [    1.933324] pci_express 0000:00:0a.0:pcie00: allocate port service
> [    1.933414] pci_express 0000:00:0a.0:pcie01: allocate port service
> [    1.933475] pci_express 0000:00:0a.0:pcie03: allocate port service
> [    1.933576] pcieport-driver 0000:00:0d.0: setting latency timer to 64
> [    1.933604] pcieport-driver 0000:00:0d.0: found MSI capability
> [    1.939667] pci_express 0000:00:0d.0:pcie00: allocate port service
> [    1.939741] pci_express 0000:00:0d.0:pcie01: allocate port service
> [    1.939801] pci_express 0000:00:0d.0:pcie03: allocate port service
> [    1.939898] pcieport-driver 0000:00:0f.0: setting latency timer to 64
> [    1.939925] pcieport-driver 0000:00:0f.0: found MSI capability
> [    1.945977] pci_express 0000:00:0f.0:pcie00: allocate port service
> [    1.946048] pci_express 0000:00:0f.0:pcie01: allocate port service
> [    1.946108] pci_express 0000:00:0f.0:pcie03: allocate port service
> [    1.947934] aer 0000:00:0a.0:pcie01: AER service couldn't init device: no
> _OSC support
> [    1.949554] aer 0000:00:0d.0:pcie01: AER service couldn't init device: no
> _OSC support
> [    1.951148] aer 0000:00:0f.0:pcie01: AER service couldn't init device: no
> _OSC support
> [    1.951606] input: Power Button (FF) as /class/input/input0
> [    1.957384] ACPI: Power Button (FF) [PWRF]
> [    1.961801] input: Power Button (CM) as /class/input/input1
> [    1.967565] ACPI: Power Button (CM) [PWRB]
> [    1.972294] ACPI: CPU0 (power states: C1[C1] C2[C2] C3[C3])
> [    1.978363] processor ACPI0007:00: registered as cooling_device0
> [    1.984576] ACPI: Processor [C000] (supports 2 throttling states)
> [    1.991063] processor ACPI0007:01: registered as cooling_device1
> [    1.997366] processor ACPI0007:02: registered as cooling_device2
> [    2.003684] processor ACPI0007:03: registered as cooling_device3
> [    2.079999] hpet_resources: 0xfed00000 is busy
> [    2.080193] Non-volatile memory driver v1.2
> [    2.084715] Linux agpgart interface v0.103
> [    2.089107] Serial: 8250/16550 driver4 ports, IRQ sharing disabled
> [    2.095680] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> [    2.102659] 00:0c: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> [    2.108789] Floppy drive(s): fd0 is 1.44M
> [    2.130069] FDC 0 is a post-1991 82077
> [    2.137592] brd: module loaded
> [    2.142041] loop: module loaded
> [    2.145379] tun: Universal TUN/TAP device driver, 1.6
> [    2.150619] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
> [    2.157181] console [netcon0] enabled
> [    2.161035] netconsole: network logging started
> [    2.165762] Uniform Multi-Platform E-IDE driver
> [    2.170615] ide_generic: please use "probe_mask=0x3f" module parameter for
> probing all legacy ISA IDE ports
> [    2.180715] Probing IDE interface ide0...
> [    3.149712] hdb: CD-224E-N, ATAPI CD/DVD-ROM drive
> [    3.206282] Probing IDE interface ide1...
> [    3.739617] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> [    3.744443] ide1 at 0x170-0x177,0x376 on irq 15
> [    3.751258] hdb: ATAPI 24X CD-ROM drive, 256kB Cache
> [    3.756646] Uniform CD-ROM driver Revision: 3.20
> [    3.771462] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST
> 2006)
> [    3.779263] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST
> 2006)
> [    3.786512] megasas: 00.00.04.01 Thu July 24 11:41:51 PST 2008
> [    3.792690] Driver 'sd' needs updating - please use bus_type methods
> [    3.799315] Driver 'sr' needs updating - please use bus_type methods
> [    3.806449] sata_nv 0000:00:05.0: version 3.5
> [    3.806741] ACPI: PCI Interrupt Link [LTID] enabled at IRQ 23
> [    3.812700] sata_nv 0000:00:05.0: PCI INT A -> Link[LTID] -> GSI 23
> (level, low) -> IRQ 23
> [    3.821311] sata_nv 0000:00:05.0: Using SWNCQ mode
> [    3.826430] sata_nv 0000:00:05.0: setting latency timer to 64
> [    3.826613] scsi0 : sata_nv
> [    3.829796] scsi1 : sata_nv
> [    3.832986] ata1: SATA max UDMA/133 cmd 0x24d0 ctl 0x24c4 bmdma 0x2490 irq
> 23
> [    3.840314] ata2: SATA max UDMA/133 cmd 0x24c8 ctl 0x24c0 bmdma 0x2498 irq
> 23
> [    4.179871] ata1: SATA link down (SStatus 0 SControl 300)
> [    4.516536] ata2: SATA link down (SStatus 0 SControl 300)
> [    4.522429] ACPI: PCI Interrupt Link [LSI1] enabled at IRQ 22
> [    4.528388] sata_nv 0000:00:05.1: PCI INT B -> Link[LSI1] -> GSI 22
> (level, low) -> IRQ 22
> [    4.536997] sata_nv 0000:00:05.1: Using SWNCQ mode
> [    4.542083] sata_nv 0000:00:05.1: setting latency timer to 64
> [    4.542220] scsi2 : sata_nv
> [    4.545334] scsi3 : sata_nv
> [    4.548512] ata3: SATA max UDMA/133 cmd 0x24e8 ctl 0x24dc bmdma 0x24a0 irq
> 22
> [    4.555840] ata4: SATA max UDMA/133 cmd 0x24e0 ctl 0x24d8 bmdma 0x24a8 irq
> 22
> [    4.896535] ata3: SATA link down (SStatus 0 SControl 300)
> [    5.233200] ata4: SATA link down (SStatus 0 SControl 300)
> [    5.239087] ACPI: PCI Interrupt Link [LSI2] enabled at IRQ 21
> [    5.245047] sata_nv 0000:00:05.2: PCI INT C -> Link[LSI2] -> GSI 21
> (level, low) -> IRQ 21
> [    5.253654] sata_nv 0000:00:05.2: Using SWNCQ mode
> [    5.258744] sata_nv 0000:00:05.2: setting latency timer to 64
> [    5.258881] scsi4 : sata_nv
> [    5.261998] scsi5 : sata_nv
> [    5.265174] ata5: SATA max UDMA/133 cmd 0x2800 ctl 0x24f4 bmdma 0x24b0 irq
> 21
> [    5.272504] ata6: SATA max UDMA/133 cmd 0x24f8 ctl 0x24f0 bmdma 0x24b8 irq
> 21
> [    5.613197] ata5: SATA link down (SStatus 0 SControl 300)
> [    5.949864] ata6: SATA link down (SStatus 0 SControl 300)
> [    5.955612] pata_amd 0000:00:04.0: version 0.3.10
> [    5.955640] pata_amd 0000:00:04.0: BAR 0: can't reserve I/O region
> [0x1f0-0x1f7]
> [    5.963385] pata_amd 0000:00:04.0: failed to request/iomap BARs for port 0
> (errno=-16)
> [    5.971653] pata_amd 0000:00:04.0: BAR 2: can't reserve I/O region
> [0x170-0x177]
> [    5.979396] pata_amd 0000:00:04.0: failed to request/iomap BARs for port 1
> (errno=-16)
> [    5.987656] pata_amd 0000:00:04.0: no available native port
> [    5.993731] Fusion MPT base driver 3.04.07
> [    5.998021] Copyright (c) 1999-2008 LSI Corporation
> [    6.003109] Fusion MPT SPI Host driver 3.04.07
> [    6.007833] Fusion MPT SAS Host driver 3.04.07
> [    6.013164] ieee1394: raw1394: /dev/raw1394 device initialized
> [    6.019695] ACPI: PCI Interrupt Link [LUS2] enabled at IRQ 20
> [    6.025650] ehci_hcd 0000:00:02.1: PCI INT B -> Link[LUS2] -> GSI 20
> (level, low) -> IRQ 20
> [    6.034353] ehci_hcd 0000:00:02.1: setting latency timer to 64
> [    6.034355] ehci_hcd 0000:00:02.1: EHCI Host Controller
> [    6.040002] ehci_hcd 0000:00:02.1: new USB bus registered, assigned bus
> number 1
> [    6.047776] ehci_hcd 0000:00:02.1: debug port 1
> [    6.052503] ehci_hcd 0000:00:02.1: cache line size of 64 is not supported
> [    6.052521] ehci_hcd 0000:00:02.1: irq 20, io mem 0xd0041000
> [    6.066176] ehci_hcd 0000:00:02.1: USB 2.0 started, EHCI 1.00, driver 10
> Dec 2004
> [    6.074154] usb usb1: configuration #1 chosen from 1 choice
> [    6.080016] hub 1-0:1.0: USB hub found
> [    6.083976] hub 1-0:1.0: 10 ports detected
> [    6.189886] ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI)
> Driver
> [    6.190142] ACPI: PCI Interrupt Link [LUS0] enabled at IRQ 19
> [    6.196100] ohci_hcd 0000:00:02.0: PCI INT A -> Link[LUS0] -> GSI 19
> (level, low) -> IRQ 19
> [    6.204801] ohci_hcd 0000:00:02.0: setting latency timer to 64
> [    6.204804] ohci_hcd 0000:00:02.0: OHCI Host Controller
> [    6.210376] ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus
> number 2
> [    6.218132] ohci_hcd 0000:00:02.0: irq 19, io mem 0xd0040000
> [    6.278315] usb usb2: configuration #1 chosen from 1 choice
> [    6.284165] hub 2-0:1.0: USB hub found
> [    6.288118] hub 2-0:1.0: 10 ports detected
> [    6.393142] USB Universal Host Controller Interface driver v3.0
> [    6.399423] usbcore: registered new interface driver usblp
> [    6.405107] Initializing USB Mass Storage driver...
> [    6.410243] usbcore: registered new interface driver usb-storage
> [    6.416443] USB Mass Storage support registered.
> [    6.421406] PNP: PS/2 Controller [PNP0303:KBC0,PNP0f13:MSE0] at 0x60,0x64
> irq 1,12
> [    6.679856] serio: i8042 KBD port at 0x60,0x64 irq 1
> [    6.685238] mice: PS/2 mouse device common for all mice
> [    6.877983] md: linear personality registered for level -1
> [    6.883663] md: raid0 personality registered for level 0
> [    6.889166] md: raid1 personality registered for level 1
> [    6.894669] md: raid10 personality registered for level 10
> [    6.956180] raid6: int64x1   2587 MB/s
> [    7.016180] raid6: int64x2   3291 MB/s
> [    7.076168] raid6: int64x4   3160 MB/s
> [    7.136168] raid6: int64x8   2360 MB/s
> [    7.196178] raid6: sse2x1    3858 MB/s
> [    7.256175] raid6: sse2x2    5139 MB/s
> [    7.316176] raid6: sse2x4    5262 MB/s
> [    7.320122] raid6: using algorithm sse2x4 (5262 MB/s)
> [    7.325365] md: raid6 personality registered for level 6
> [    7.335744] md: raid5 personality registered for level 5
> [    7.341252] md: raid4 personality registered for level 4
> [    7.346753] md: multipath personality registered for level -4
> [    7.352960] device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised:
> dm-devel@redhat.com
> [    7.361877] cpuidle: using governor ladder
> [    7.366402] cpuidle: using governor menu
> [    7.370591] usbcore: registered new interface driver usbhid
> [    7.376362] usbhid: v2.6:USB HID core driver
> [    7.380936] oprofile: using NMI interrupt.
> [    7.385695] TCP cubic registered
> [    7.389332] NET: Registered protocol family 10
> [    7.394985] IPv6 over IPv4 tunneling driver
> [    7.400135] NET: Registered protocol family 17
> [    7.405122] RPC: Registered udp transport module.
> [    7.410021] RPC: Registered tcp transport module.
> [    7.415002] powernow-k8: Found 2 Dual-Core AMD Opteron(tm) Processor 2220
> processors (4 cpu cores) (version 2.20.00)
> [    7.425975] powernow-k8:    0 : fid 0x14 (2800 MHz), vid 0x8
> [    7.431831] powernow-k8:    1 : fid 0x12 (2600 MHz), vid 0xa
> [    7.437687] powernow-k8:    2 : fid 0x10 (2400 MHz), vid 0xc
> [    7.443537] powernow-k8:    3 : fid 0xe (2200 MHz), vid 0xe
> [    7.449301] powernow-k8:    4 : fid 0xc (2000 MHz), vid 0x10
> [    7.455149] powernow-k8:    5 : fid 0xa (1800 MHz), vid 0x10
> [    7.460999] powernow-k8:    6 : fid 0x2 (1000 MHz), vid 0x12
> [    7.467050] powernow-k8:    0 : fid 0x14 (2800 MHz), vid 0x8
> [    7.472929] powernow-k8:    1 : fid 0x12 (2600 MHz), vid 0xa
> [    7.478785] powernow-k8:    2 : fid 0x10 (2400 MHz), vid 0xc
> [    7.484637] powernow-k8:    3 : fid 0xe (2200 MHz), vid 0xe
> [    7.490411] powernow-k8:    4 : fid 0xc (2000 MHz), vid 0x10
> [    7.496279] powernow-k8:    5 : fid 0xa (1800 MHz), vid 0x10
> [    7.502134] powernow-k8:    6 : fid 0x2 (1000 MHz), vid 0x12
> [    7.508739] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
> [    7.515289] Freeing unused kernel memory: 676k freed
> [    7.520818] Write protecting the kernel read-only data: 5672k
> [    7.742843] Clocksource tsc unstable (delta = -111149082 ns)
> [    8.177545] ACPI: PCI Interrupt Link [LK3E] enabled at IRQ 18
> [    8.177654] arcmsr 0000:02:00.0: PCI INT A -> Link[LK3E] -> GSI 18 (level,
> low) -> IRQ 18
> [    8.177666] arcmsr 0000:02:00.0: setting latency timer to 64
> [    8.189529] ARECA RAID ADAPTER6: FIRMWARE VERSION V1.44 2008-1-31  
> [    8.202850] scsi6 : Areca SATA Host Adapter RAID Controller( RAID6
> capable)
> [    8.202853]  Driver Version 1.20.00.15 2008/02/27
> [    8.205018] scsi 6:0:0:0: Direct-Access     Areca    ARC-1280-VOL#00  R001
> PQ: 0 ANSI: 5
> [    8.207052] sd 6:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> [    8.207117] sd 6:0:0:0: [sda] 21484367872 512-byte hardware sectors
> (10999996 MB)
> [    8.207182] sd 6:0:0:0: [sda] Write Protect is off
> [    8.207188] sd 6:0:0:0: [sda] Mode Sense: cb 00 00 08
> [    8.207298] sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> [    8.207560] sd 6:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> [    8.207621] sd 6:0:0:0: [sda] 21484367872 512-byte hardware sectors
> (10999996 MB)
> [    8.207685] sd 6:0:0:0: [sda] Write Protect is off
> [    8.207692] sd 6:0:0:0: [sda] Mode Sense: cb 00 00 08
> [    8.207802] sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> [    8.207811]  sda: sda1 sda2 sda3 sda4
> [    8.245400] sd 6:0:0:0: [sda] Attached SCSI disk
> [    8.247390] sd 6:0:0:0: Attached scsi generic sg0 type 0
> [    8.249720] scsi 6:0:16:0: Processor         Areca    RAID controller 
> R001 PQ: 0 ANSI: 0
> [    8.252259] scsi 6:0:16:0: Attached scsi generic sg1 type 3
> [    8.504889] 3ware Storage Controller device driver for Linux v1.26.02.002.
> [    8.542592] 3ware 9000 Storage Controller device driver for Linux
> v2.26.02.011.
> [    8.601009] Adaptec aacraid driver 1.1-5[2456]-ms
> [    8.993029] SGI XFS with ACLs, security attributes, large block/inode
> numbers, no debug enabled
> [    8.998828] SGI XFS Quota Management subsystem
> [    9.048982] Intel(R) PRO/1000 Network Driver - version 7.3.20-k3-NAPI
> [    9.048990] Copyright (c) 1999-2006 Intel Corporation.
> [   12.346739] EXT3-fs: INFO: recovery required on readonly filesystem.
> [   12.346748] EXT3-fs: write access will be enabled during recovery.
> [   13.161202] kjournald starting.  Commit interval 5 seconds
> [   13.161240] EXT3-fs: sda3: orphan cleanup on readonly fs
> [   13.161256] ext3_orphan_cleanup: deleting unreferenced inode 2501876
> [   13.161508] ext3_orphan_cleanup: deleting unreferenced inode 2501875
> [   13.161529] ext3_orphan_cleanup: deleting unreferenced inode 2501874
> [   13.161549] ext3_orphan_cleanup: deleting unreferenced inode 2501873
> [   13.161565] EXT3-fs: sda3: 4 orphan inodes deleted
> [   13.161570] EXT3-fs: recovery complete.
> [   13.162085] EXT3-fs: mounted filesystem with ordered data mode.
> [   15.634222] i2c-adapter i2c-0: nForce2 SMBus adapter at 0x2440
> [   15.634257] i2c-adapter i2c-1: nForce2 SMBus adapter at 0x2400
> [   15.660203] forcedeth: Reverse Engineered nForce ethernet driver. Version
> 0.61.
> [   15.660582] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 17
> [   15.660659] forcedeth 0000:00:08.0: PCI INT A -> Link[LMAC] -> GSI 17
> (level, low) -> IRQ 17
> [   15.660663] forcedeth 0000:00:08.0: setting latency timer to 64
> [   15.663568] forcedeth 0000:00:08.0: ifname eth0, PHY OUI 0x5043 @ 0, addr
> 00:e0:81:76:8b:be
> [   15.663571] forcedeth 0000:00:08.0: highdma csum vlan pwrctl mgmt timirq
> gbit lnktim msi desc-v3
> [   15.663876] ACPI: PCI Interrupt Link [LMA2] enabled at IRQ 16
> [   15.663898] forcedeth 0000:00:09.0: PCI INT A -> Link[LMA2] -> GSI 16
> (level, low) -> IRQ 16
> [   15.663923] forcedeth 0000:00:09.0: setting latency timer to 64
> [   15.665255] forcedeth 0000:00:09.0: ifname eth1, PHY OUI 0x5043 @ 1, addr
> 00:e0:81:76:8b:bf
> [   15.665257] forcedeth 0000:00:09.0: highdma csum vlan pwrctl mgmt timirq
> gbit lnktim msi desc-v3
> [   18.107517] EXT3 FS on sda3, internal journal
> [   18.380032] SMsC 37B787 watchdog component driver 1.1 initialising...
> [   18.381424] smsc37b787_wdt: Timeout set to 60 second(s).
> [   18.381429] smsc37b787_wdt: Watchdog initialized and sleeping
> (nowayout=0)...
> [   21.417259] kjournald starting.  Commit interval 5 seconds
> [   21.417297] EXT3-fs warning: checktime reached, running e2fsck is
> recommended
> [   21.422349] EXT3 FS on sda4, internal journal
> [   21.422359] EXT3-fs: recovery complete.
> [   21.422430] EXT3-fs: mounted filesystem with ordered data mode.
> [   21.501039] Adding 15999992k swap on /dev/sda2.  Priority:-1 extents:1
> across:15999992k
> [   42.038582] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state
> recovery directory
> [   42.045903] NFSD: starting 90-second grace period
> [   45.396027] eth0: no IPv6 routers present
> [   49.761642] w83627hf: Found W83627HF chip at 0xc00
Comment 12 Anonymous Emailer 2008-09-15 19:55:12 UTC
Reply-To: josh@hoblitt.com

In addition to the deadlocks, we still have the watchdog warning:

[    0.460034] Testing NMI watchdog ... 
[    0.532557] WARNING: CPU#0: NMI appears to be stuck (0->0)!
[    0.533301] Please report this to bugzilla.kernel.org,
[    0.536635] and attach the output of the 'dmesg' command.

Perhaps an HPET problem:

[    0.993800] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31
[    0.999969] hpet0: 3 32-bit timers, 25000000 Hz
[    1.004396] ACPI: RTC can wake from S4
[    1.006637] Clockevents: could not switch to one-shot
mode:<6>Clockevents: could not switch to one-shot mode: lapic is not functional.
[    1.009844] Could not switch to high resolution mode on CPU 3
[    1.009848] Clockevents: could not switch to one-shot mode: lapic is not functional.
[    1.009852] Could not switch to high resolution mode on CPU 2
[    1.009855] Clockevents: could not switch to one-shot mode: lapic is not functional.
[    1.009858] Could not switch to high resolution mode on CPU 1
[    1.009969]  lapic is not functional.
[    1.056944] Could not switch to high resolution mode on CPU 0

And a failure to create a /dev/rtc[0] device with udev 115 or 119 and
this entry in the dmesg.

[    7.498900] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)

-J

--
On Mon, Sep 15, 2008 at 11:06:35AM -1000, Joshua Hoblitt wrote:
> Since Friday 2 different machines have experienced crashes.  One was a total
> deadlock with no response on the console.  The other one reported the trace
> below on the console and stopped responding to ssh but I was able to loging
> via
> the serial console and reboot the system.  This particular system has had a
> number of "odd" kernel traces over the last year and I'm starting to actually
> wonder if it may have a bad DIMM in it as occasionally the failure mode seems
> to be different then the deadlocks/etc. we see in the other 15 nodes with
> identical hardware.
> 
> [30712.654542] general protection fault: 0000 [1] SMP 
> <Sep/12 09:25 pm>[30712.657678] CPU 3 
> <Sep/12 09:25 pm>[30712.657678] Modules linked in: w83627hf hwmon_vid autofs4
> smsc37b787_wdt k8temp i2c_nforce2 i2c_core forcedeth tg3 libphy e1000 xfs
> dm_snapshot dm_mirror dm_log aacraid 3w_9xxx 3w_xxxx atp870u arcmsr aic7xxx
> scsi_wait_scan
> <Sep/12 09:25 pm>[30712.657678] Pid: 1178, comm: rpciod/3 Not tainted
> 2.6.27-rc5-22033-gd26acd9-dirty #2
> <Sep/12 09:25 pm>[30712.657678] RIP: 0010:[<ffffffff805ac57d>] 
> [<ffffffff805ac57d>] rpc_count_iostats+0x35/0xb8
> <Sep/12 09:25 pm>[30712.657678] RSP: 0018:ffff88012e5d1e08  EFLAGS: 00010206
> <Sep/12 09:25 pm>[30712.657678] RAX: ffffffff807adb48 RBX: ffff880126d61088
> RCX: 0400000000000000
> <Sep/12 09:25 pm>[30712.657678] RDX: ffff88022bcc0380 RSI: ffff88022bcc0000
> RDI: ffff880126d61088
> <Sep/12 09:25 pm>[30712.657678] RBP: ffff88022bc88000 R08: 0000000000000003
> R09: ffff88022e038f10
> <Sep/12 09:25 pm>[30712.657678] R10: 0000000000000001 R11: ffff88012e4a0048
> R12: 0400000000000000
> <Sep/12 09:25 pm>[30712.657678] R13: ffffffff8059f7a8 R14: ffff88022bc88610
> R15: 0000000000000000
> <Sep/12 09:25 pm>[30712.657678] FS:  00007f006fb306f0(0000)
> GS:ffff88022fa0d780(0000) knlGS:0000000000000000
> 
> <Sep/12 09:25 pm>[30712.657678] CS:  0010 DS: 0018 ES: 0018 CR0:
> 000000008005003b
> <Sep/12 09:25 pm>[30712.657678] CR2: 00000000011e3018 CR3: 00000001f3527000
> CR4:<Sep/12 09:25 pm>
>                  <Sep/12 09:25 pm>[30712.657678] DR0: 0000000000000000 DR1:
>                  0000000000000000 DR2: 0000000000000000
> <Sep/12 09:25 pm>[30712.657678] DR3: 0000000000000000 DR6: 00000000ffff0ff0
> DR7: 0000000000000400
> <Sep/12 09:25 pm>[30712.657678] Process rpciod/3 (pid: 1178, threadinfo
> ffff88012e5d0000, task ffff88012e5acb50)
> <Sep/12 09:25 pm>[30712.657678] Stack:  ffffffff8059b996 ffff880126d61088
> 00000000fffffff5 ffff880126d61118
> <Sep/12 09:25 pm>[30712.657678]  ffffffff8059f7a8 ffffffff808a4b20
> ffffffff80599aa1 ffffffff8059f7a8
> <Sep/12 09:25 pm>[30712.657678]  0000000000000000 ffff880126d61088
> ffffffff8059f5f3 ffff88012e5d1e90
> <Sep/12 09:25 pm>[30712.657678] Call Trace:
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff8059b996>] ?
> xprt_release+0x2c/0x172
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff8059f7a8>] ?
> rpc_async_schedule+0x0/0xc
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff80599aa1>] ?
> call_reserveresult+0x95/0xd1
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff8059f7a8>] ?
> rpc_async_schedule+0x0/0xc
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff8059f5f3>] ?
> __rpc_execute+0x73/0x228
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff80244394>] ?
> run_workqueue+0xed/0x1ed
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff8024433e>] ?
> run_workqueue+0x97/0x1ed
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff80244ef7>] ?
> worker_thread+0xd8/0xe3
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff8024791a>] ?
> autoremove_wake_function+0x0/0x2e
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff80244e1f>] ?
> worker_thread+0x0/0xe3
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff80247805>] ? kthread+0x47/0x76
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff80230223>] ?
> schedule_tail+0x27/0x5f
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff8020ce09>] ? child_rip+0xa/0x11
> 
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff8024767c>] ? kthreadd+0x167/0x18c
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff802477be>] ? kthread+0x0/0x76
> <Sep/12 09:25 pm>[30712.657678]  [<ffffffff8020cdff>] ? child_rip+0x0/0x11
> <Sep/12 09:25 pm>[30712.657678] 
> <Sep/12 09:25 pm>[30712.657678] 
> <Sep/12 09:25 pm>[30712.657678] Code: b0 98 00 00 00 48 85 f6 0f 94 c2 48 85
> c9 0f 94 c0 08 c2 0f 85 94 00 00 00 48 8b 47 38 8b 50 28 48 c1 e2 07 48 8d 14
> 16 48 ff 02 <48> 63 81 50 01 00 00 48 01 42 08 0f b7 87 e0 00 00 00 48 01 42 
> <Sep/12 09:25 pm>[30712.657678] RIP  [<ffffffff805ac57d>]
> rpc_count_iostats+0x35/0xb8
> <Sep/12 09:25 pm>[30712.657678]  RSP <ffff88012e5d1e08>
> <Sep/12 09:25 pm>[30712.951578] ---[ end trace e646bef2de7f0b20 ]---
> 
> -J
> 
> --
> On Fri, Sep 12, 2008 at 02:13:06PM -1000, Joshua Hoblitt wrote:
> > I just rolled out -rc5 from netdev + Andrew's debug patch + the HPET
> > patch Thomas pointed me at.  I'll let it roast on these 14 machine is
> > production over the weekend to see if we get another panic.
> > 
> > I'm attaching the dmesg from this kernel.  We're still getting the NMI
> > watchdog warning and the rtc is [still] hosed (I think it was last
> > working around -rc3).
> > 
> > -J
> > 
> > --
> > On Fri, Sep 12, 2008 at 11:13:08AM +0200, Ingo Molnar wrote:
> > > 
> > > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > > 
> > > > I can't work out who called panic(), nor why.
> > > > 
> > > > The panic code called the kexec code which called mutex_trylock() 
> > > > which called spin_lock_mutex() which then stupidly went and blurted a 
> > > > load of debug stuff because of in_interrupt().
> > > 
> > > agreed - applied your fix in the form below to tip/master - thanks 
> > > Andrew.
> > > 
> > > J, you might want to try tip/master, it includes all known fixes for 
> > > this area and this debug improvement as well. You can pick it up via:
> > > 
> > >   http://people.redhat.com/mingo/tip.git/README
> > > 
> > >   Ingo
> > > 
> > > ---------->
> > > >From 53b9d87f41a3d8838210ad7cdef02d814817ce85 Mon Sep 17 00:00:00 2001
> > > From: Andrew Morton <akpm@linux-foundation.org>
> > > Date: Thu, 11 Sep 2008 17:02:58 -0700
> > > Subject: [PATCH] lock debug: sit tight when we are already in a panic
> > > 
> > > in:
> > > 
> > >   > http://bugzilla.kernel.org/show_bug.cgi?id=11543
> > > 
> > > The panic code called the kexec code which called mutex_trylock() which
> > > called spin_lock_mutex() which then stupidly went and blurted a load of
> > > debug stuff because of in_interrupt().
> > > 
> > > Keep the lock debug code from escallating an already crappy situation.
> > > 
> > > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > ---
> > >  include/linux/debug_locks.h |    2 +-
> > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
> > > index 4aaa4af..096476f 100644
> > > --- a/include/linux/debug_locks.h
> > > +++ b/include/linux/debug_locks.h
> > > @@ -17,7 +17,7 @@ extern int debug_locks_off(void);
> > >  ({                                                                      
> \
> > >   int __ret = 0;                                                  \
> > >                                                                   \
> > > - if (unlikely(c)) {                                              \
> > > + if (!oops_in_progress && unlikely(c)) {                         \
> > >           if (debug_locks_off() && !debug_locks_silent)           \
> > >                   WARN_ON(1);                                     \
> > >           __ret = 1;                                              \
> 
> > [    0.000000] Initializing cgroup subsys cpuset
> > [    0.000000] Linux version 2.6.27-rc4-21704-gd25e26b (root@ipp000) (gcc
> version 4.1.2 (Gentoo 4.1.2 p1.1)) #1 SMP Tue Sep 2 16:05:45 HST 2008
> > [    0.000000] Command line: root=/dev/ram0 real_root=/dev/sda3
> init=/linuxrc nmi_watchdog=1 show_msr=1 console=tty0 console=ttyS0,115200n8
> > [    0.000000] KERNEL supported cpus:
> > [    0.000000]   Intel GenuineIntel
> > [    0.000000]   AMD AuthenticAMD
> > [    0.000000]   Centaur CentaurHauls
> > [    0.000000] BIOS-provided physical RAM map:
> > [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009dc00 (usable)
> > [    0.000000]  BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)
> > [    0.000000]  BIOS-e820: 00000000000ce000 - 0000000000100000 (reserved)
> > [    0.000000]  BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)
> > [    0.000000]  BIOS-e820: 00000000cff60000 - 00000000cff6a000 (ACPI data)
> > [    0.000000]  BIOS-e820: 00000000cff6a000 - 00000000cff80000 (ACPI NVS)
> > [    0.000000]  BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
> > [    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
> > [    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
> > [    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
> > [    0.000000]  BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
> > [    0.000000]  BIOS-e820: 0000000100000000 - 0000000230000000 (usable)
> > [    0.000000] last_pfn = 0x230000 max_arch_pfn = 0x3ffffffff
> > [    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new
> 0x7010600070106
> > [    0.000000] last_pfn = 0xcff60 max_arch_pfn = 0x3ffffffff
> > [    0.000000] init_memory_mapping
> > [    0.000000]  0000000000 - 00cfe00000 page 2M
> > [    0.000000]  00cfe00000 - 00cff60000 page 4k
> > [    0.000000] kernel direct mapping tables up to cff60000 @ 8000-e000
> > [    0.000000] last_map_addr: cff60000 end: cff60000
> > [    0.000000] init_memory_mapping
> > [    0.000000]  0100000000 - 0230000000 page 2M
> > [    0.000000] kernel direct mapping tables up to 230000000 @ c000-16000
> > [    0.000000] last_map_addr: 230000000 end: 230000000
> > [    0.000000] RAMDISK: 37e0f000 - 37fef7dc
> > [    0.000000] DMI present.
> > [    0.000000] ACPI: RSDP 000F75D0, 0024 (r2 PTLTD )
> > [    0.000000] ACPI: XSDT CFF656FA, 0064 (r1 PTLTD           XSDT   
> 6040000  LTP        0)
> > [    0.000000] ACPI: FACP CFF6970E, 00F4 (r3 NVIDIA CK8S      6040000 PTL_ 
>   F4240)
> > [    0.000000] ACPI: DSDT CFF6575E, 3F3C (r1 NVIDIA      CK8  6040000 MSFT 
> 3000000)
> > [    0.000000] ACPI: FACS CFF6AFC0, 0040
> > [    0.000000] ACPI: SSDT CFF69802, 0574 (r1 AMD    POWERNOW  6040000 AMD  
>       1)
> > [    0.000000] ACPI: SRAT CFF69D76, 0110 (r1 AMD    HAMMER    6040000 AMD  
>       1)
> > [    0.000000] ACPI: SPCR CFF69E86, 0050 (r1 PTLTD  $UCRTBL$  6040000 PTL  
>       1)
> > [    0.000000] ACPI: MCFG CFF69ED6, 003C (r1 PTLTD    MCFG    6040000  LTP 
>       0)
> > [    0.000000] ACPI: HPET CFF69F12, 0038 (r1 PTLTD  HPETTBL   6040000  LTP 
>       1)
> > [    0.000000] ACPI: APIC CFF69F4A, 008E (r1 PTLTD           APIC   
> 6040000  LTP        0)
> > [    0.000000] ACPI: BOOT CFF69FD8, 0028 (r1 PTLTD  $SBFTBL$  6040000  LTP 
>       1)
> > [    0.000000] SRAT: PXM 0 -> APIC 0 -> Node 0
> > [    0.000000] SRAT: PXM 0 -> APIC 1 -> Node 0
> > [    0.000000] SRAT: PXM 1 -> APIC 2 -> Node 1
> > [    0.000000] SRAT: PXM 1 -> APIC 3 -> Node 1
> > [    0.000000] SRAT: Node 0 PXM 0 0-a0000
> > [    0.000000] SRAT: Node 0 PXM 0 100000-d0000000
> > [    0.000000] SRAT: Node 0 PXM 0 100000000-130000000
> > [    0.000000] SRAT: Node 1 PXM 1 130000000-230000000
> > [    0.000000] NUMA: Allocated memnodemap from 11000 - 15680
> > [    0.000000] NUMA: Using 20 for the hash shift.
> > [    0.000000] Bootmem setup node 0 0000000000000000-0000000130000000
> > [    0.000000]   NODE_DATA [0000000000015680 - 000000000002067f]
> > [    0.000000]   bootmap [0000000000021000 -  0000000000046fff] pages 26
> > [    0.000000] (8 early reservations) ==> bootmem [0000000000 - 0130000000]
> > [    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==>
> [0000000000 - 0000001000]
> > [    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE ==>
> [0000006000 - 0000008000]
> > [    0.000000]   #2 [0000200000 - 0000f96890]    TEXT DATA BSS ==>
> [0000200000 - 0000f96890]
> > [    0.000000]   #3 [0037e0f000 - 0037fef7dc]          RAMDISK ==>
> [0037e0f000 - 0037fef7dc]
> > [    0.000000]   #4 [000009dc00 - 0000100000]    BIOS reserved ==>
> [000009dc00 - 0000100000]
> > [    0.000000]   #5 [0000008000 - 000000c000]          PGTABLE ==>
> [0000008000 - 000000c000]
> > [    0.000000]   #6 [000000c000 - 0000011000]          PGTABLE ==>
> [000000c000 - 0000011000]
> > [    0.000000]   #7 [0000011000 - 0000015680]       MEMNODEMAP ==>
> [0000011000 - 0000015680]
> > [    0.000000] Bootmem setup node 1 0000000130000000-0000000230000000
> > [    0.000000]   NODE_DATA [0000000130000000 - 000000013000afff]
> > [    0.000000]   bootmap [000000013000b000 -  000000013002afff] pages 20
> > [    0.000000] (8 early reservations) ==> bootmem [0130000000 - 0230000000]
> > [    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page
> > [    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE
> > [    0.000000]   #2 [0000200000 - 0000f96890]    TEXT DATA BSS
> > [    0.000000]   #3 [0037e0f000 - 0037fef7dc]          RAMDISK
> > [    0.000000]   #4 [000009dc00 - 0000100000]    BIOS reserved
> > [    0.000000]   #5 [0000008000 - 000000c000]          PGTABLE
> > [    0.000000]   #6 [000000c000 - 0000011000]          PGTABLE
> > [    0.000000]   #7 [0000011000 - 0000015680]       MEMNODEMAP
> > [    0.000000] found SMP MP-table at [ffff8800000f7600] 000f7600
> > [    0.000000]  [ffffe20000000000-ffffe200071fffff] PMD ->
> [ffff880028200000-ffff88002e1fffff] on node 0
> > [    0.000000]  [ffffe20007200000-ffffe2000d1fffff] PMD ->
> [ffff880130200000-ffff8801361fffff] on node 1
> > [    0.000000] Zone PFN ranges:
> > [    0.000000]   DMA      0x00000000 -> 0x00001000
> > [    0.000000]   DMA32    0x00001000 -> 0x00100000
> > [    0.000000]   Normal   0x00100000 -> 0x00230000
> > [    0.000000] Movable zone start PFN for each node
> > [    0.000000] early_node_map[4] active PFN ranges
> > [    0.000000]     0: 0x00000000 -> 0x0000009d
> > [    0.000000]     0: 0x00000100 -> 0x000cff60
> > [    0.000000]     0: 0x00100000 -> 0x00130000
> > [    0.000000]     1: 0x00130000 -> 0x00230000
> > [    0.000000] On node 0 totalpages: 1048317
> > [    0.000000]   DMA zone: 307 pages, LIFO batch:0
> > [    0.000000]   DMA32 zone: 823232 pages, LIFO batch:31
> > [    0.000000]   Normal zone: 192000 pages, LIFO batch:31
> > [    0.000000] On node 1 totalpages: 1048576
> > [    0.000000]   Normal zone: 1024000 pages, LIFO batch:31
> > [    0.000000] Detected use of extended apic ids on hypertransport bus
> > [    0.000000] Detected use of extended apic ids on hypertransport bus
> > [    0.000000] ACPI: PM-Timer IO Port: 0x1008
> > [    0.000000] ACPI: Local APIC address 0xfee00000
> > [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> > [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
> > [    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
> > [    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
> > [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
> > [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
> > [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
> > [    0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
> > [    0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
> > [    0.000000] IOAPIC[0]: apic_id 4, version 0, address 0xfec00000, GSI
> 0-23
> > [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
> > [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> > [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> > [    0.000000] ACPI: IRQ0 used by override.
> > [    0.000000] ACPI: IRQ2 used by override.
> > [    0.000000] ACPI: IRQ9 used by override.
> > [    0.000000] Setting APIC routing to flat
> > [    0.000000] ACPI: HPET id: 0x10de8201 base: 0xfed00000
> > [    0.000000] Using ACPI (MADT) for SMP configuration information
> > [    0.000000] SMP: Allowing 4 CPUs, 0 hotplug CPUs
> > [    0.000000] PM: Registered nosave memory: 000000000009d000 -
> 000000000009e000
> > [    0.000000] PM: Registered nosave memory: 000000000009e000 -
> 00000000000a0000
> > [    0.000000] PM: Registered nosave memory: 00000000000a0000 -
> 00000000000ce000
> > [    0.000000] PM: Registered nosave memory: 00000000000ce000 -
> 0000000000100000
> > [    0.000000] PM: Registered nosave memory: 00000000cff60000 -
> 00000000cff6a000
> > [    0.000000] PM: Registered nosave memory: 00000000cff6a000 -
> 00000000cff80000
> > [    0.000000] PM: Registered nosave memory: 00000000cff80000 -
> 00000000d0000000
> > [    0.000000] PM: Registered nosave memory: 00000000d0000000 -
> 00000000e0000000
> > [    0.000000] PM: Registered nosave memory: 00000000e0000000 -
> 00000000f0000000
> > [    0.000000] PM: Registered nosave memory: 00000000f0000000 -
> 00000000fec00000
> > [    0.000000] PM: Registered nosave memory: 00000000fec00000 -
> 00000000fec10000
> > [    0.000000] PM: Registered nosave memory: 00000000fec10000 -
> 00000000fee00000
> > [    0.000000] PM: Registered nosave memory: 00000000fee00000 -
> 00000000fee01000
> > [    0.000000] PM: Registered nosave memory: 00000000fee01000 -
> 00000000fff80000
> > [    0.000000] PM: Registered nosave memory: 00000000fff80000 -
> 0000000100000000
> > [    0.000000] Allocating PCI resources starting at d1000000 (gap:
> d0000000:10000000)
> > [    0.000000] PERCPU: Allocating 51872 bytes of per cpu data
> > [    0.000000] NR_CPUS: 32, nr_cpu_ids: 4, nr_node_ids 2
> > [    0.000000] Built 2 zonelists in Node order, mobility grouping on. 
> Total pages: 2039539
> > [    0.000000] Policy zone: Normal
> > [    0.000000] Kernel command line: root=/dev/ram0 real_root=/dev/sda3
> init=/linuxrc nmi_watchdog=1 show_msr=1 console=tty0 console=ttyS0,115200n8
> > [    0.000000] Initializing CPU#0
> > [    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
> > [    0.000000] Extended CMOS year: 2000
> > [    0.000000] TSC calibrated against PM_TIMER
> > [    0.000000] Detected 2800.009 MHz processor.
> > [    0.003333] spurious 8259A interrupt: IRQ7.
> > [    0.003333] Console: colour VGA+ 80x25
> > [    0.003333] console [tty0] enabled
> > [    0.003333] console [ttyS0] enabled
> > [    0.003333] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc.,
> Ingo Molnar
> > [    0.003333] ... MAX_LOCKDEP_SUBCLASSES:    8
> > [    0.003333] ... MAX_LOCK_DEPTH:          48
> > [    0.003333] ... MAX_LOCKDEP_KEYS:        8191
> > [    0.003333] ... CLASSHASH_SIZE:           4096
> > [    0.003333] ... MAX_LOCKDEP_ENTRIES:     8192
> > [    0.003333] ... MAX_LOCKDEP_CHAINS:      16384
> > [    0.003333] ... CHAINHASH_SIZE:          8192
> > [    0.003333]  memory used by lock dependency info: 3839 kB
> > [    0.003333]  per task-struct memory footprint: 1920 bytes
> > [    0.003333] Checking aperture...
> > [    0.003333] No AGP bridge found
> > [    0.003333] Node 0: aperture @ 0 size 32 MB
> > [    0.003333] Your BIOS doesn't leave a aperture memory hole
> > [    0.003333] Please enable the IOMMU option in the BIOS setup
> > [    0.003333] This costs you 64 MB of RAM
> > [    0.003333] Mapping aperture over 65536 KB of RAM @ 20000000
> > [    0.003333] PM: Registered nosave memory: 0000000020000000 -
> 0000000024000000
> > [    0.003333] Memory: 8108304k/9175040k available (3896k kernel code,
> 279268k reserved, 2243k data, 676k init)
> > [    0.003333] CPA: page pool initialized 1 of 1 pages preallocated
> > [    0.003333] SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0,
> CPUs=4, Nodes=2
> > [    0.003333] hpet clockevent registered
> > [    0.003333] Calibrating delay loop (skipped), value calculated using
> timer frequency.. 5602.35 BogoMIPS (lpj=9333363)
> > [    0.006698] Security Framework initialized
> > [    0.010737] Dentry cache hash table entries: 1048576 (order: 11, 8388608
> bytes)
> > [    0.018937] Inode-cache hash table entries: 524288 (order: 10, 4194304
> bytes)
> > [    0.025421] Mount-cache hash table entries: 256
> > [    0.026974] Initializing cgroup subsys ns
> > [    0.030010] Initializing cgroup subsys cpuacct
> > [    0.033335] Initializing cgroup subsys memory
> > [    0.036674] Initializing cgroup subsys devices
> > [    0.040006] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64
> bytes/line)
> > [    0.043332] CPU: L2 Cache: 1024K (64 bytes/line)
> > [    0.046666] CPU 0/0 -> Node 0
> > [    0.049999] tseg: 00cff80000
> > [    0.050000] CPU: Physical Processor ID: 0
> > [    0.053331] CPU: Processor Core ID: 0
> > [    0.056670] using C1E aware idle routine
> > [    0.060018] ACPI: Core revision 20080609
> > [    0.070530] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> > [    0.108872] activating NMI Watchdog ... done.
> > [    0.109994] CPU0: Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
> > [    0.116660] Using local APIC timer interrupts.
> > [    0.123326] APIC timer calibration result 12500076
> > [    0.123328] Detected 12.500 MHz APIC timer.
> > [    0.126660] APIC timer registered as dummy, due to nmi_watchdog=1!
> > [    0.130154] lockdep: fixing up alternatives.
> > [    0.133347] Booting processor 1/1 ip 6000
> > [    0.003333] Initializing CPU#1
> > [    0.003333] Calibrating delay using timer specific routine.. 5602.36
> BogoMIPS (lpj=9333385)
> > [    0.003333] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64
> bytes/line)
> > [    0.003333] CPU: L2 Cache: 1024K (64 bytes/line)
> > [    0.003333] CPU 1/1 -> Node 0
> > [    0.003333] CPU: Physical Processor ID: 0
> > [    0.003333] CPU: Processor Core ID: 1
> > [    0.003333] x86 PAT enabled: cpu 1, old 0x7040600070406, new
> 0x7010600070106
> > [    0.230372] CPU1: Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
> > [    0.238145] lockdep: fixing up alternatives.
> > [    0.239999] Booting processor 2/2 ip 6000
> > [    0.003333] Initializing CPU#2
> > [    0.003333] Calibrating delay using timer specific routine.. 5602.44
> BogoMIPS (lpj=9333513)
> > [    0.003333] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64
> bytes/line)
> > [    0.003333] CPU: L2 Cache: 1024K (64 bytes/line)
> > [    0.003333] CPU 2/2 -> Node 1
> > [    0.003333] CPU: Physical Processor ID: 1
> > [    0.003333] CPU: Processor Core ID: 0
> > [    0.003333] x86 PAT enabled: cpu 2, old 0x7040600070406, new
> 0x7010600070106
> > [    0.340530] CPU2: Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
> > [    0.348138] lockdep: fixing up alternatives.
> > [    0.349997] Booting processor 3/3 ip 6000
> > [    0.003333] Initializing CPU#3
> > [    0.003333] Calibrating delay using timer specific routine.. 5602.48
> BogoMIPS (lpj=9333575)
> > [    0.003333] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64
> bytes/line)
> > [    0.003333] CPU: L2 Cache: 1024K (64 bytes/line)
> > [    0.003333] CPU 3/3 -> Node 1
> > [    0.003333] CPU: Physical Processor ID: 1
> > [    0.003333] CPU: Processor Core ID: 1
> > [    0.003333] x86 PAT enabled: cpu 3, old 0x7040600070406, new
> 0x7010600070106
> > [    0.447118] CPU3: Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
> > [    0.454734] Brought up 4 CPUs
> > [    0.456640] Total of 4 processors activated (22409.64 BogoMIPS).
> > [    0.460033] Testing NMI watchdog ... 
> > [    0.532432] WARNING: CPU#0: NMI appears to be stuck (0->0)!
> > [    0.533301] Please report this to bugzilla.kernel.org,
> > [    0.536635] and attach the output of the 'dmesg' command.
> > [    0.539968] 
> > [    0.543302] WARNING: CPU#1: NMI appears to be stuck (0->0)!
> > [    0.546635] Please report this to bugzilla.kernel.org,
> > [    0.549968] and attach the output of the 'dmesg' command.
> > [    0.553301] 
> > [    0.554980] WARNING: CPU#2: NMI appears to be stuck (0->0)!
> > [    0.556635] Please report this to bugzilla.kernel.org,
> > [    0.559968] and attach the output of the 'dmesg' command.
> > [    0.563301] 
> > [    0.566635] WARNING: CPU#3: NMI appears to be stuck (0->0)!
> > [    0.569968] Please report this to bugzilla.kernel.org,
> > [    0.573301] and attach the output of the 'dmesg' command.
> > [    0.577183] net_namespace: 968 bytes
> > [    0.580160] xor: automatically using best checksumming function:
> generic_sse
> > [    0.599969]    generic_sse:  8630.400 MB/sec
> > [    0.603302] xor: using function: generic_sse (8630.400 MB/sec)
> > [    0.606692] NET: Registered protocol family 16
> > [    0.610254] No dock devices found.
> > [    0.613503] node 0 link 2: io port [1000, 3fff]
> > [    0.613506] TOM: 00000000d0000000 aka 3328M
> > [    0.616636] node 0 link 2: mmio [d0000000, dfffffff]
> > [    0.616638] node 0 link 2: mmio [a0000, bffff]
> > [    0.616640] node 0 link 2: mmio [e0000000, e7ffffff]
> > [    0.616643] TOM2: 0000000230000000 aka 8960M
> > [    0.619969] bus: [00,ff] on node 0 link 2
> > [    0.619970] bus: 00 index 0 io port: [0, ffff]
> > [    0.619972] bus: 00 index 1 mmio: [d0000000, ffffffff]
> > [    0.619974] bus: 00 index 2 mmio: [a0000, bffff]
> > [    0.619975] bus: 00 index 3 mmio: [230000000, fcffffffff]
> > [    0.619989] ACPI: bus type pci registered
> > [    0.623448] PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 -
> 4
> > [    0.626636] PCI: MCFG area at e0000000 reserved in E820
> > [    0.630065] PCI: Using MMCONFIG at e0000000 - e04fffff
> > [    0.633302] PCI: Using configuration type 1 for base access
> > [    0.643980] ACPI: EC: Look up EC in DSDT
> > [    0.648409] ACPI: Interpreter enabled
> > [    0.649969] ACPI: (supports S0 S1 S3 S4 S5)
> > [    0.656635] ACPI: Using IOAPIC for interrupt routing
> > [    0.670034] ACPI: PCI Root Bridge [PCI0] (0000:00)
> > [    0.673616] pci 0000:00:01.1: PME# supported from D3hot D3cold
> > [    0.676644] pci 0000:00:01.1: PME# disabled
> > [    0.680009] pci 0000:00:02.0: supports D1
> > [    0.680010] pci 0000:00:02.0: supports D2
> > [    0.680012] pci 0000:00:02.0: PME# supported from D0 D1 D2 D3hot D3cold
> > [    0.683303] pci 0000:00:02.0: PME# disabled
> > [    0.686672] pci 0000:00:02.1: supports D1
> > [    0.686673] pci 0000:00:02.1: supports D2
> > [    0.686675] pci 0000:00:02.1: PME# supported from D0 D1 D2 D3hot D3cold
> > [    0.689969] pci 0000:00:02.1: PME# disabled
> > [    0.693535] pci 0000:00:08.0: supports D1
> > [    0.693537] pci 0000:00:08.0: supports D2
> > [    0.693538] pci 0000:00:08.0: PME# supported from D0 D1 D2 D3hot D3cold
> > [    0.696637] pci 0000:00:08.0: PME# disabled
> > [    0.700017] pci 0000:00:09.0: supports D1
> > [    0.700019] pci 0000:00:09.0: supports D2
> > [    0.700020] pci 0000:00:09.0: PME# supported from D0 D1 D2 D3hot D3cold
> > [    0.703304] pci 0000:00:09.0: PME# disabled
> > [    0.706664] pci 0000:00:0a.0: PME# supported from D0 D1 D2 D3hot D3cold
> > [    0.709969] pci 0000:00:0a.0: PME# disabled
> > [    0.713331] pci 0000:00:0d.0: PME# supported from D0 D1 D2 D3hot D3cold
> > [    0.716636] pci 0000:00:0d.0: PME# disabled
> > [    0.719997] pci 0000:00:0f.0: PME# supported from D0 D1 D2 D3hot D3cold
> > [    0.723302] pci 0000:00:0f.0: PME# disabled
> > [    0.726878] pci 0000:01:04.0: supports D1
> > [    0.726879] pci 0000:01:04.0: supports D2
> > [    0.726904] pci 0000:00:06.0: transparent bridge
> > [    0.729969] PCI: bridge 0000:00:06.0 io port: [3000, 3fff]
> > [    0.733303] PCI: bridge 0000:00:06.0 32bit mmio: [d0100000, d01fffff]
> > [    0.736636] PCI: bridge 0000:00:06.0 32bit mmio pref: [d8000000,
> dfffffff]
> > [    0.740034] pci 0000:02:00.0: supports D1
> > [    0.740064] PCI: bridge 0000:00:0a.0 32bit mmio: [d0200000, d02fffff]
> > [    0.743388] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
> > [    0.743462] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P2P0._PRT]
> > [    0.743486] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.XVR0._PRT]
> > [    0.743513] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.XVR2._PRT]
> > [    0.743540] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.XVR5._PRT]
> > [    0.769795] ACPI: PCI Interrupt Link [LNK1] (IRQs 5 7 10 11 14 15 16 17
> 18 19 20 21 22 23) *0, disabled.
> > [    0.777165] ACPI: PCI Interrupt Link [LNK2] (IRQs 5 7 *10 11 14 15 16 17
> 18 19 20 21 22 23)
> > [    0.786790] ACPI: PCI Interrupt Link [LNK3] (IRQs 5 7 10 11 14 15 16 17
> 18 19 20 21 22 23) *0, disabled.
> > [    0.797164] ACPI: PCI Interrupt Link [LNK4] (IRQs 5 7 10 11 14 15 16 17
> 18 19 20 21 22 23) *0, disabled.
> > [    0.808626] ACPI: PCI Interrupt Link [LK1E] (IRQs 5 7 10 11 14 15 16 17
> 18 19 20 21 22 23) *0, disabled.
> > [    0.816220] ACPI: PCI Interrupt Link [LK2E] (IRQs 5 7 10 *11 14 15 16 17
> 18 19 20 21 22 23)
> > [    0.822660] ACPI: PCI Interrupt Link [LK3E] (IRQs 5 7 10 11 14 15 16 17
> 18 19 20 21 22 23) *0, disabled.
> > [    0.830122] ACPI: PCI Interrupt Link [LK4E] (IRQs 5 7 10 11 14 15 16 17
> 18 19 20 21 22 23) *0, disabled.
> > [    0.840505] ACPI: PCI Interrupt Link [LSMB] (IRQs 5 7 10 11 14 15 16 17
> 18 19 20 21 22 23) *0, disabled.
> > [    0.851950] ACPI: PCI Interrupt Link [LUS0] (IRQs 5 7 *10 11 14 15 16 17
> 18 19 20 21 22 23)
> > [    0.858235] ACPI: PCI Interrupt Link [LMA2] (IRQs 5 7 10 *11 14 15 16 17
> 18 19 20 21 22 23)
> > [    0.867922] ACPI: PCI Interrupt Link [LUS2] (IRQs *5 7 10 11 14 15 16 17
> 18 19 20 21 22 23)
> > [    0.877614] ACPI: PCI Interrupt Link [LMAC] (IRQs 5 7 *10 11 14 15 16 17
> 18 19 20 21 22 23)
> > [    0.887303] ACPI: PCI Interrupt Link [LAZA] (IRQs 5 7 10 11 14 15 16 17
> 18 19 20 21 22 23) *0, disabled.
> > [    0.898166] ACPI: PCI Interrupt Link [LPID] (IRQs 5 7 10 11 14 15 16 17
> 18 19 20 21 22 23) *0, disabled.
> > [    0.909241] ACPI: PCI Interrupt Link [LTID] (IRQs 5 7 10 *11 14 15 16 17
> 18 19 20 21 22 23)
> > [    0.915682] ACPI: PCI Interrupt Link [LSI1] (IRQs 5 7 *10 11 14 15 16 17
> 18 19 20 21 22 23)
> > [    0.921892] ACPI: PCI Interrupt Link [LSI2] (IRQs 5 7 10 *11 14 15 16 17
> 18 19 20 21 22 23)
> > [    0.928367] Linux Plug and Play Support v0.97 (c) Adam Belay
> > [    0.930030] pnp: PnP ACPI init
> > [    0.933313] ACPI: bus type pnp registered
> > [    0.942967] pnp: PnP ACPI: found 15 devices
> > [    0.943302] ACPI: ACPI bus type pnp unregistered
> > [    0.946899] SCSI subsystem initialized
> > [    0.950132] libata version 3.00 loaded.
> > [    0.953483] usbcore: registered new interface driver usbfs
> > [    0.956699] usbcore: registered new interface driver hub
> > [    0.960060] usbcore: registered new device driver usb
> > [    0.963671] PCI: Using ACPI for IRQ routing
> > [    0.980227] PCI-DMA: Disabling AGP.
> > [    0.984431] PCI-DMA: aperture base @ 20000000 size 65536 KB
> > [    0.986635] PCI-DMA: using GART IOMMU.
> > [    0.989971] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
> > [    0.993870] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31
> > [    1.000341] hpet0: 3 32-bit timers, 25000000 Hz
> > [    1.004395] ACPI: RTC can wake from S4
> > [    1.006637] Clockevents: could not switch to one-shot
> mode:<6>Clockevents: could not switch to one-shot mode: lapic is not
> functional.
> > [    1.009524] Could not switch to high resolution mode on CPU 3
> > [    1.009528] Clockevents: could not switch to one-shot mode: lapic is not
> functional.
> > [    1.009531] Could not switch to high resolution mode on CPU 1
> > [    1.009534] Clockevents: could not switch to one-shot mode: lapic is not
> functional.
> > [    1.009538] Could not switch to high resolution mode on CPU 2
> > [    1.009969]  lapic is not functional.
> > [    1.056944] Could not switch to high resolution mode on CPU 0
> > [    1.076221] system 00:00: iomem range 0xffc00000-0xffffffff could not be
> reserved
> > [    1.084054] system 00:00: iomem range 0xfec00000-0xfec00fff could not be
> reserved
> > [    1.091883] system 00:00: iomem range 0xfee00000-0xfeefffff could not be
> reserved
> > [    1.099719] system 00:00: iomem range 0xfed00000-0xfed00fff has been
> reserved
> > [    1.107052] system 00:03: iomem range 0xd0000000-0xd0007fff has been
> reserved
> > [    1.114383] system 00:04: ioport range 0x1000-0x107f has been reserved
> > [    1.121101] system 00:04: ioport range 0x1080-0x10ff has been reserved
> > [    1.127817] system 00:04: ioport range 0x1400-0x147f has been reserved
> > [    1.134537] system 00:04: ioport range 0x1480-0x14ff has been reserved
> > [    1.141258] system 00:04: ioport range 0x1800-0x187f has been reserved
> > [    1.147976] system 00:04: ioport range 0x1880-0x18ff has been reserved
> > [    1.154698] system 00:04: ioport range 0x2440-0x247f has been reserved
> > [    1.161420] system 00:04: ioport range 0x2400-0x243f has been reserved
> > [    1.168151] system 00:07: ioport range 0x4d0-0x4d1 has been reserved
> > [    1.174695] system 00:07: ioport range 0xc05-0xc06 has been reserved
> > [    1.182277] pci 0000:00:06.0: PCI bridge, secondary bus 0000:01
> > [    1.188388] pci 0000:00:06.0:   IO window: 0x3000-0x3fff
> > [    1.193889] pci 0000:00:06.0:   MEM window: 0xd0100000-0xd01fffff
> > [    1.200176] pci 0000:00:06.0:   PREFETCH window:
> 0x000000d8000000-0x000000dfffffff
> > [    1.208089] pci 0000:00:0a.0: PCI bridge, secondary bus 0000:02
> > [    1.214205] pci 0000:00:0a.0:   IO window: disabled
> > [    1.219275] pci 0000:00:0a.0:   MEM window: 0xd0200000-0xd02fffff
> > [    1.225558] pci 0000:00:0a.0:   PREFETCH window:
> 0x000000d1000000-0x000000d10fffff
> > [    1.233474] pci 0000:00:0d.0: PCI bridge, secondary bus 0000:03
> > [    1.239598] pci 0000:00:0d.0:   IO window: disabled
> > [    1.244666] pci 0000:00:0d.0:   MEM window: disabled
> > [    1.249825] pci 0000:00:0d.0:   PREFETCH window: disabled
> > [    1.255416] pci 0000:00:0f.0: PCI bridge, secondary bus 0000:04
> > [    1.261525] pci 0000:00:0f.0:   IO window: disabled
> > [    1.266594] pci 0000:00:0f.0:   MEM window: disabled
> > [    1.271751] pci 0000:00:0f.0:   PREFETCH window: disabled
> > [    1.277347] pci 0000:00:06.0: setting latency timer to 64
> > [    1.277352] pci 0000:00:0a.0: setting latency timer to 64
> > [    1.277357] pci 0000:00:0d.0: setting latency timer to 64
> > [    1.277363] pci 0000:00:0f.0: setting latency timer to 64
> > [    1.277365] bus: 00 index 0 io port: [0, ffff]
> > [    1.282004] bus: 00 index 1 mmio: [0, ffffffffffffffff]
> > [    1.292287] bus: 01 index 0 io port: [3000, 3fff]
> > [    1.297187] bus: 01 index 1 mmio: [d0100000, d01fffff]
> > [    1.302519] bus: 01 index 2 mmio: [d8000000, dfffffff]
> > [    1.307849] bus: 01 index 3 io port: [0, ffff]
> > [    1.312483] bus: 01 index 4 mmio: [0, ffffffffffffffff]
> > [    1.317900] bus: 02 index 0 mmio: [0, 0]
> > [    1.322017] bus: 02 index 1 mmio: [d0200000, d02fffff]
> > [    1.327349] bus: 02 index 2 mmio: [d1000000, d10fffff]
> > [    1.332678] bus: 02 index 3 mmio: [0, 0]
> > [    1.336793] bus: 03 index 0 mmio: [0, 0]
> > [    1.340910] bus: 03 index 1 mmio: [0, 0]
> > [    1.345025] bus: 03 index 2 mmio: [0, 0]
> > [    1.349142] bus: 03 index 3 mmio: [0, 0]
> > [    1.353256] bus: 04 index 0 mmio: [0, 0]
> > [    1.357374] bus: 04 index 1 mmio: [0, 0]
> > [    1.361490] bus: 04 index 2 mmio: [0, 0]
> > [    1.365609] bus: 04 index 3 mmio: [0, 0]
> > [    1.369739] NET: Registered protocol family 2
> > [    1.413137] IP route cache hash table entries: 262144 (order: 9, 2097152
> bytes)
> > [    1.423160] TCP established hash table entries: 524288 (order: 11,
> 8388608 bytes)
> > [    1.436206] TCP bind hash table entries: 65536 (order: 9, 3670016 bytes)
> > [    1.445874] TCP: Hash tables configured (established 524288 bind 65536)
> > [    1.452716] TCP reno registered
> > [    1.466350] NET: Registered protocol family 1
> > [    1.471023] checking if image is initramfs... it is
> > [    1.572646] Freeing initrd memory: 1921k freed
> > [    1.578336] Simple Boot Flag at 0x37 set to 0x80
> > [    1.585905] audit: initializing netlink socket (disabled)
> > [    1.591560] type=2000 audit(1221176947.589:1): initialized
> > [    1.602577] HugeTLB registered 2 MB page size, pre-allocated 0 pages
> > [    1.613210] VFS: Disk quotas dquot_6.5.1
> > [    1.617471] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
> > [    1.625780] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
> > [    1.632809] msgmni has been set to 15840
> > [    1.637191] async_tx: api initialized (async)
> > [    1.641747] io scheduler noop registered
> > [    1.645871] io scheduler anticipatory registered
> > [    1.650694] io scheduler deadline registered
> > [    1.655347] io scheduler cfq registered (default)
> > [    1.660281] pci 0000:00:00.0: Enabling HT MSI Mapping
> > [    1.879582] pci 0000:00:05.0: Enabling HT MSI Mapping
> > [    1.884851] pci 0000:00:05.1: Enabling HT MSI Mapping
> > [    1.890123] pci 0000:00:05.2: Enabling HT MSI Mapping
> > [    1.895393] pci 0000:00:06.0: Enabling HT MSI Mapping
> > [    1.900669] pci 0000:00:08.0: Enabling HT MSI Mapping
> > [    1.905945] pci 0000:00:09.0: Enabling HT MSI Mapping
> > [    1.911213] pci 0000:00:0a.0: Enabling HT MSI Mapping
> > [    1.916486] pci 0000:00:0d.0: Enabling HT MSI Mapping
> > [    1.921761] pci 0000:00:0f.0: Enabling HT MSI Mapping
> > [    1.927029] pci 0000:01:04.0: Boot video device
> > [    1.927212] pcieport-driver 0000:00:0a.0: setting latency timer to 64
> > [    1.927241] pcieport-driver 0000:00:0a.0: found MSI capability
> > [    1.933324] pci_express 0000:00:0a.0:pcie00: allocate port service
> > [    1.933414] pci_express 0000:00:0a.0:pcie01: allocate port service
> > [    1.933475] pci_express 0000:00:0a.0:pcie03: allocate port service
> > [    1.933576] pcieport-driver 0000:00:0d.0: setting latency timer to 64
> > [    1.933604] pcieport-driver 0000:00:0d.0: found MSI capability
> > [    1.939667] pci_express 0000:00:0d.0:pcie00: allocate port service
> > [    1.939741] pci_express 0000:00:0d.0:pcie01: allocate port service
> > [    1.939801] pci_express 0000:00:0d.0:pcie03: allocate port service
> > [    1.939898] pcieport-driver 0000:00:0f.0: setting latency timer to 64
> > [    1.939925] pcieport-driver 0000:00:0f.0: found MSI capability
> > [    1.945977] pci_express 0000:00:0f.0:pcie00: allocate port service
> > [    1.946048] pci_express 0000:00:0f.0:pcie01: allocate port service
> > [    1.946108] pci_express 0000:00:0f.0:pcie03: allocate port service
> > [    1.947934] aer 0000:00:0a.0:pcie01: AER service couldn't init device:
> no _OSC support
> > [    1.949554] aer 0000:00:0d.0:pcie01: AER service couldn't init device:
> no _OSC support
> > [    1.951148] aer 0000:00:0f.0:pcie01: AER service couldn't init device:
> no _OSC support
> > [    1.951606] input: Power Button (FF) as /class/input/input0
> > [    1.957384] ACPI: Power Button (FF) [PWRF]
> > [    1.961801] input: Power Button (CM) as /class/input/input1
> > [    1.967565] ACPI: Power Button (CM) [PWRB]
> > [    1.972294] ACPI: CPU0 (power states: C1[C1] C2[C2] C3[C3])
> > [    1.978363] processor ACPI0007:00: registered as cooling_device0
> > [    1.984576] ACPI: Processor [C000] (supports 2 throttling states)
> > [    1.991063] processor ACPI0007:01: registered as cooling_device1
> > [    1.997366] processor ACPI0007:02: registered as cooling_device2
> > [    2.003684] processor ACPI0007:03: registered as cooling_device3
> > [    2.079999] hpet_resources: 0xfed00000 is busy
> > [    2.080193] Non-volatile memory driver v1.2
> > [    2.084715] Linux agpgart interface v0.103
> > [    2.089107] Serial: 8250/16550 driver4 ports, IRQ sharing disabled
> > [    2.095680] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> > [    2.102659] 00:0c: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> > [    2.108789] Floppy drive(s): fd0 is 1.44M
> > [    2.130069] FDC 0 is a post-1991 82077
> > [    2.137592] brd: module loaded
> > [    2.142041] loop: module loaded
> > [    2.145379] tun: Universal TUN/TAP device driver, 1.6
> > [    2.150619] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
> > [    2.157181] console [netcon0] enabled
> > [    2.161035] netconsole: network logging started
> > [    2.165762] Uniform Multi-Platform E-IDE driver
> > [    2.170615] ide_generic: please use "probe_mask=0x3f" module parameter
> for probing all legacy ISA IDE ports
> > [    2.180715] Probing IDE interface ide0...
> > [    3.149712] hdb: CD-224E-N, ATAPI CD/DVD-ROM drive
> > [    3.206282] Probing IDE interface ide1...
> > [    3.739617] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> > [    3.744443] ide1 at 0x170-0x177,0x376 on irq 15
> > [    3.751258] hdb: ATAPI 24X CD-ROM drive, 256kB Cache
> > [    3.756646] Uniform CD-ROM driver Revision: 3.20
> > [    3.771462] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03
> EST 2006)
> > [    3.779263] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST
> 2006)
> > [    3.786512] megasas: 00.00.04.01 Thu July 24 11:41:51 PST 2008
> > [    3.792690] Driver 'sd' needs updating - please use bus_type methods
> > [    3.799315] Driver 'sr' needs updating - please use bus_type methods
> > [    3.806449] sata_nv 0000:00:05.0: version 3.5
> > [    3.806741] ACPI: PCI Interrupt Link [LTID] enabled at IRQ 23
> > [    3.812700] sata_nv 0000:00:05.0: PCI INT A -> Link[LTID] -> GSI 23
> (level, low) -> IRQ 23
> > [    3.821311] sata_nv 0000:00:05.0: Using SWNCQ mode
> > [    3.826430] sata_nv 0000:00:05.0: setting latency timer to 64
> > [    3.826613] scsi0 : sata_nv
> > [    3.829796] scsi1 : sata_nv
> > [    3.832986] ata1: SATA max UDMA/133 cmd 0x24d0 ctl 0x24c4 bmdma 0x2490
> irq 23
> > [    3.840314] ata2: SATA max UDMA/133 cmd 0x24c8 ctl 0x24c0 bmdma 0x2498
> irq 23
> > [    4.179871] ata1: SATA link down (SStatus 0 SControl 300)
> > [    4.516536] ata2: SATA link down (SStatus 0 SControl 300)
> > [    4.522429] ACPI: PCI Interrupt Link [LSI1] enabled at IRQ 22
> > [    4.528388] sata_nv 0000:00:05.1: PCI INT B -> Link[LSI1] -> GSI 22
> (level, low) -> IRQ 22
> > [    4.536997] sata_nv 0000:00:05.1: Using SWNCQ mode
> > [    4.542083] sata_nv 0000:00:05.1: setting latency timer to 64
> > [    4.542220] scsi2 : sata_nv
> > [    4.545334] scsi3 : sata_nv
> > [    4.548512] ata3: SATA max UDMA/133 cmd 0x24e8 ctl 0x24dc bmdma 0x24a0
> irq 22
> > [    4.555840] ata4: SATA max UDMA/133 cmd 0x24e0 ctl 0x24d8 bmdma 0x24a8
> irq 22
> > [    4.896535] ata3: SATA link down (SStatus 0 SControl 300)
> > [    5.233200] ata4: SATA link down (SStatus 0 SControl 300)
> > [    5.239087] ACPI: PCI Interrupt Link [LSI2] enabled at IRQ 21
> > [    5.245047] sata_nv 0000:00:05.2: PCI INT C -> Link[LSI2] -> GSI 21
> (level, low) -> IRQ 21
> > [    5.253654] sata_nv 0000:00:05.2: Using SWNCQ mode
> > [    5.258744] sata_nv 0000:00:05.2: setting latency timer to 64
> > [    5.258881] scsi4 : sata_nv
> > [    5.261998] scsi5 : sata_nv
> > [    5.265174] ata5: SATA max UDMA/133 cmd 0x2800 ctl 0x24f4 bmdma 0x24b0
> irq 21
> > [    5.272504] ata6: SATA max UDMA/133 cmd 0x24f8 ctl 0x24f0 bmdma 0x24b8
> irq 21
> > [    5.613197] ata5: SATA link down (SStatus 0 SControl 300)
> > [    5.949864] ata6: SATA link down (SStatus 0 SControl 300)
> > [    5.955612] pata_amd 0000:00:04.0: version 0.3.10
> > [    5.955640] pata_amd 0000:00:04.0: BAR 0: can't reserve I/O region
> [0x1f0-0x1f7]
> > [    5.963385] pata_amd 0000:00:04.0: failed to request/iomap BARs for port
> 0 (errno=-16)
> > [    5.971653] pata_amd 0000:00:04.0: BAR 2: can't reserve I/O region
> [0x170-0x177]
> > [    5.979396] pata_amd 0000:00:04.0: failed to request/iomap BARs for port
> 1 (errno=-16)
> > [    5.987656] pata_amd 0000:00:04.0: no available native port
> > [    5.993731] Fusion MPT base driver 3.04.07
> > [    5.998021] Copyright (c) 1999-2008 LSI Corporation
> > [    6.003109] Fusion MPT SPI Host driver 3.04.07
> > [    6.007833] Fusion MPT SAS Host driver 3.04.07
> > [    6.013164] ieee1394: raw1394: /dev/raw1394 device initialized
> > [    6.019695] ACPI: PCI Interrupt Link [LUS2] enabled at IRQ 20
> > [    6.025650] ehci_hcd 0000:00:02.1: PCI INT B -> Link[LUS2] -> GSI 20
> (level, low) -> IRQ 20
> > [    6.034353] ehci_hcd 0000:00:02.1: setting latency timer to 64
> > [    6.034355] ehci_hcd 0000:00:02.1: EHCI Host Controller
> > [    6.040002] ehci_hcd 0000:00:02.1: new USB bus registered, assigned bus
> number 1
> > [    6.047776] ehci_hcd 0000:00:02.1: debug port 1
> > [    6.052503] ehci_hcd 0000:00:02.1: cache line size of 64 is not
> supported
> > [    6.052521] ehci_hcd 0000:00:02.1: irq 20, io mem 0xd0041000
> > [    6.066176] ehci_hcd 0000:00:02.1: USB 2.0 started, EHCI 1.00, driver 10
> Dec 2004
> > [    6.074154] usb usb1: configuration #1 chosen from 1 choice
> > [    6.080016] hub 1-0:1.0: USB hub found
> > [    6.083976] hub 1-0:1.0: 10 ports detected
> > [    6.189886] ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller
> (OHCI) Driver
> > [    6.190142] ACPI: PCI Interrupt Link [LUS0] enabled at IRQ 19
> > [    6.196100] ohci_hcd 0000:00:02.0: PCI INT A -> Link[LUS0] -> GSI 19
> (level, low) -> IRQ 19
> > [    6.204801] ohci_hcd 0000:00:02.0: setting latency timer to 64
> > [    6.204804] ohci_hcd 0000:00:02.0: OHCI Host Controller
> > [    6.210376] ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus
> number 2
> > [    6.218132] ohci_hcd 0000:00:02.0: irq 19, io mem 0xd0040000
> > [    6.278315] usb usb2: configuration #1 chosen from 1 choice
> > [    6.284165] hub 2-0:1.0: USB hub found
> > [    6.288118] hub 2-0:1.0: 10 ports detected
> > [    6.393142] USB Universal Host Controller Interface driver v3.0
> > [    6.399423] usbcore: registered new interface driver usblp
> > [    6.405107] Initializing USB Mass Storage driver...
> > [    6.410243] usbcore: registered new interface driver usb-storage
> > [    6.416443] USB Mass Storage support registered.
> > [    6.421406] PNP: PS/2 Controller [PNP0303:KBC0,PNP0f13:MSE0] at
> 0x60,0x64 irq 1,12
> > [    6.679856] serio: i8042 KBD port at 0x60,0x64 irq 1
> > [    6.685238] mice: PS/2 mouse device common for all mice
> > [    6.877983] md: linear personality registered for level -1
> > [    6.883663] md: raid0 personality registered for level 0
> > [    6.889166] md: raid1 personality registered for level 1
> > [    6.894669] md: raid10 personality registered for level 10
> > [    6.956180] raid6: int64x1   2587 MB/s
> > [    7.016180] raid6: int64x2   3291 MB/s
> > [    7.076168] raid6: int64x4   3160 MB/s
> > [    7.136168] raid6: int64x8   2360 MB/s
> > [    7.196178] raid6: sse2x1    3858 MB/s
> > [    7.256175] raid6: sse2x2    5139 MB/s
> > [    7.316176] raid6: sse2x4    5262 MB/s
> > [    7.320122] raid6: using algorithm sse2x4 (5262 MB/s)
> > [    7.325365] md: raid6 personality registered for level 6
> > [    7.335744] md: raid5 personality registered for level 5
> > [    7.341252] md: raid4 personality registered for level 4
> > [    7.346753] md: multipath personality registered for level -4
> > [    7.352960] device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised:
> dm-devel@redhat.com
> > [    7.361877] cpuidle: using governor ladder
> > [    7.366402] cpuidle: using governor menu
> > [    7.370591] usbcore: registered new interface driver usbhid
> > [    7.376362] usbhid: v2.6:USB HID core driver
> > [    7.380936] oprofile: using NMI interrupt.
> > [    7.385695] TCP cubic registered
> > [    7.389332] NET: Registered protocol family 10
> > [    7.394985] IPv6 over IPv4 tunneling driver
> > [    7.400135] NET: Registered protocol family 17
> > [    7.405122] RPC: Registered udp transport module.
> > [    7.410021] RPC: Registered tcp transport module.
> > [    7.415002] powernow-k8: Found 2 Dual-Core AMD Opteron(tm) Processor
> 2220 processors (4 cpu cores) (version 2.20.00)
> > [    7.425975] powernow-k8:    0 : fid 0x14 (2800 MHz), vid 0x8
> > [    7.431831] powernow-k8:    1 : fid 0x12 (2600 MHz), vid 0xa
> > [    7.437687] powernow-k8:    2 : fid 0x10 (2400 MHz), vid 0xc
> > [    7.443537] powernow-k8:    3 : fid 0xe (2200 MHz), vid 0xe
> > [    7.449301] powernow-k8:    4 : fid 0xc (2000 MHz), vid 0x10
> > [    7.455149] powernow-k8:    5 : fid 0xa (1800 MHz), vid 0x10
> > [    7.460999] powernow-k8:    6 : fid 0x2 (1000 MHz), vid 0x12
> > [    7.467050] powernow-k8:    0 : fid 0x14 (2800 MHz), vid 0x8
> > [    7.472929] powernow-k8:    1 : fid 0x12 (2600 MHz), vid 0xa
> > [    7.478785] powernow-k8:    2 : fid 0x10 (2400 MHz), vid 0xc
> > [    7.484637] powernow-k8:    3 : fid 0xe (2200 MHz), vid 0xe
> > [    7.490411] powernow-k8:    4 : fid 0xc (2000 MHz), vid 0x10
> > [    7.496279] powernow-k8:    5 : fid 0xa (1800 MHz), vid 0x10
> > [    7.502134] powernow-k8:    6 : fid 0x2 (1000 MHz), vid 0x12
> > [    7.508739] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
> > [    7.515289] Freeing unused kernel memory: 676k freed
> > [    7.520818] Write protecting the kernel read-only data: 5672k
> > [    7.742843] Clocksource tsc unstable (delta = -111149082 ns)
> > [    8.177545] ACPI: PCI Interrupt Link [LK3E] enabled at IRQ 18
> > [    8.177654] arcmsr 0000:02:00.0: PCI INT A -> Link[LK3E] -> GSI 18
> (level, low) -> IRQ 18
> > [    8.177666] arcmsr 0000:02:00.0: setting latency timer to 64
> > [    8.189529] ARECA RAID ADAPTER6: FIRMWARE VERSION V1.44 2008-1-31  
> > [    8.202850] scsi6 : Areca SATA Host Adapter RAID Controller( RAID6
> capable)
> > [    8.202853]  Driver Version 1.20.00.15 2008/02/27
> > [    8.205018] scsi 6:0:0:0: Direct-Access     Areca    ARC-1280-VOL#00 
> R001 PQ: 0 ANSI: 5
> > [    8.207052] sd 6:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> > [    8.207117] sd 6:0:0:0: [sda] 21484367872 512-byte hardware sectors
> (10999996 MB)
> > [    8.207182] sd 6:0:0:0: [sda] Write Protect is off
> > [    8.207188] sd 6:0:0:0: [sda] Mode Sense: cb 00 00 08
> > [    8.207298] sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> > [    8.207560] sd 6:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> > [    8.207621] sd 6:0:0:0: [sda] 21484367872 512-byte hardware sectors
> (10999996 MB)
> > [    8.207685] sd 6:0:0:0: [sda] Write Protect is off
> > [    8.207692] sd 6:0:0:0: [sda] Mode Sense: cb 00 00 08
> > [    8.207802] sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled,
> doesn't support DPO or FUA
> > [    8.207811]  sda: sda1 sda2 sda3 sda4
> > [    8.245400] sd 6:0:0:0: [sda] Attached SCSI disk
> > [    8.247390] sd 6:0:0:0: Attached scsi generic sg0 type 0
> > [    8.249720] scsi 6:0:16:0: Processor         Areca    RAID controller 
> R001 PQ: 0 ANSI: 0
> > [    8.252259] scsi 6:0:16:0: Attached scsi generic sg1 type 3
> > [    8.504889] 3ware Storage Controller device driver for Linux
> v1.26.02.002.
> > [    8.542592] 3ware 9000 Storage Controller device driver for Linux
> v2.26.02.011.
> > [    8.601009] Adaptec aacraid driver 1.1-5[2456]-ms
> > [    8.993029] SGI XFS with ACLs, security attributes, large block/inode
> numbers, no debug enabled
> > [    8.998828] SGI XFS Quota Management subsystem
> > [    9.048982] Intel(R) PRO/1000 Network Driver - version 7.3.20-k3-NAPI
> > [    9.048990] Copyright (c) 1999-2006 Intel Corporation.
> > [   12.346739] EXT3-fs: INFO: recovery required on readonly filesystem.
> > [   12.346748] EXT3-fs: write access will be enabled during recovery.
> > [   13.161202] kjournald starting.  Commit interval 5 seconds
> > [   13.161240] EXT3-fs: sda3: orphan cleanup on readonly fs
> > [   13.161256] ext3_orphan_cleanup: deleting unreferenced inode 2501876
> > [   13.161508] ext3_orphan_cleanup: deleting unreferenced inode 2501875
> > [   13.161529] ext3_orphan_cleanup: deleting unreferenced inode 2501874
> > [   13.161549] ext3_orphan_cleanup: deleting unreferenced inode 2501873
> > [   13.161565] EXT3-fs: sda3: 4 orphan inodes deleted
> > [   13.161570] EXT3-fs: recovery complete.
> > [   13.162085] EXT3-fs: mounted filesystem with ordered data mode.
> > [   15.634222] i2c-adapter i2c-0: nForce2 SMBus adapter at 0x2440
> > [   15.634257] i2c-adapter i2c-1: nForce2 SMBus adapter at 0x2400
> > [   15.660203] forcedeth: Reverse Engineered nForce ethernet driver.
> Version 0.61.
> > [   15.660582] ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 17
> > [   15.660659] forcedeth 0000:00:08.0: PCI INT A -> Link[LMAC] -> GSI 17
> (level, low) -> IRQ 17
> > [   15.660663] forcedeth 0000:00:08.0: setting latency timer to 64
> > [   15.663568] forcedeth 0000:00:08.0: ifname eth0, PHY OUI 0x5043 @ 0,
> addr 00:e0:81:76:8b:be
> > [   15.663571] forcedeth 0000:00:08.0: highdma csum vlan pwrctl mgmt timirq
> gbit lnktim msi desc-v3
> > [   15.663876] ACPI: PCI Interrupt Link [LMA2] enabled at IRQ 16
> > [   15.663898] forcedeth 0000:00:09.0: PCI INT A -> Link[LMA2] -> GSI 16
> (level, low) -> IRQ 16
> > [   15.663923] forcedeth 0000:00:09.0: setting latency timer to 64
> > [   15.665255] forcedeth 0000:00:09.0: ifname eth1, PHY OUI 0x5043 @ 1,
> addr 00:e0:81:76:8b:bf
> > [   15.665257] forcedeth 0000:00:09.0: highdma csum vlan pwrctl mgmt timirq
> gbit lnktim msi desc-v3
> > [   18.107517] EXT3 FS on sda3, internal journal
> > [   18.380032] SMsC 37B787 watchdog component driver 1.1 initialising...
> > [   18.381424] smsc37b787_wdt: Timeout set to 60 second(s).
> > [   18.381429] smsc37b787_wdt: Watchdog initialized and sleeping
> (nowayout=0)...
> > [   21.417259] kjournald starting.  Commit interval 5 seconds
> > [   21.417297] EXT3-fs warning: checktime reached, running e2fsck is
> recommended
> > [   21.422349] EXT3 FS on sda4, internal journal
> > [   21.422359] EXT3-fs: recovery complete.
> > [   21.422430] EXT3-fs: mounted filesystem with ordered data mode.
> > [   21.501039] Adding 15999992k swap on /dev/sda2.  Priority:-1 extents:1
> across:15999992k
> > [   42.038582] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state
> recovery directory
> > [   42.045903] NFSD: starting 90-second grace period
> > [   45.396027] eth0: no IPv6 routers present
> > [   49.761642] w83627hf: Found W83627HF chip at 0xc00
> 
Comment 13 Thomas Gleixner 2008-09-16 07:15:30 UTC
On Mon, 15 Sep 2008, Joshua Hoblitt wrote:

> In addition to the deadlocks, we still have the watchdog warning:
> 
> [    0.460034] Testing NMI watchdog ... 
> [    0.532557] WARNING: CPU#0: NMI appears to be stuck (0->0)!
> [    0.533301] Please report this to bugzilla.kernel.org,
> [    0.536635] and attach the output of the 'dmesg' command.
> 
> Perhaps an HPET problem:
> 
> [    0.993800] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31
> [    0.999969] hpet0: 3 32-bit timers, 25000000 Hz
> [    1.004396] ACPI: RTC can wake from S4
> [    1.006637] Clockevents: could not switch to one-shot
> mode:<6>Clockevents: could not switch to one-shot mode: lapic is not
> functional.
> [    1.009844] Could not switch to high resolution mode on CPU 3
> [    1.009848] Clockevents: could not switch to one-shot mode: lapic is not
> functional.
> [    1.009852] Could not switch to high resolution mode on CPU 2
> [    1.009855] Clockevents: could not switch to one-shot mode: lapic is not
> functional.
> [    1.009858] Could not switch to high resolution mode on CPU 1
> [    1.009969]  lapic is not functional.
> [    1.056944] Could not switch to high resolution mode on CPU 0

No, that's documented behaviour:

> > > [    0.126660] APIC timer registered as dummy, due to nmi_watchdog=1!

Can you try nmi_watchdog=2 ?

Thanks,

	tglx
Comment 14 Cyrill Gorcunov 2008-09-16 10:57:00 UTC
[Thomas Gleixner - Tue, Sep 16, 2008 at 07:14:40AM -0700]
| On Mon, 15 Sep 2008, Joshua Hoblitt wrote:
| 
| > In addition to the deadlocks, we still have the watchdog warning:
| > 
| > [    0.460034] Testing NMI watchdog ... 
| > [    0.532557] WARNING: CPU#0: NMI appears to be stuck (0->0)!
| > [    0.533301] Please report this to bugzilla.kernel.org,
| > [    0.536635] and attach the output of the 'dmesg' command.
| > 
| > Perhaps an HPET problem:
| > 
| > [    0.993800] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31
| > [    0.999969] hpet0: 3 32-bit timers, 25000000 Hz
| > [    1.004396] ACPI: RTC can wake from S4
| > [    1.006637] Clockevents: could not switch to one-shot
| > mode:<6>Clockevents: could not switch to one-shot mode: lapic is not functional.
| > [    1.009844] Could not switch to high resolution mode on CPU 3
| > [    1.009848] Clockevents: could not switch to one-shot mode: lapic is not functional.
| > [    1.009852] Could not switch to high resolution mode on CPU 2
| > [    1.009855] Clockevents: could not switch to one-shot mode: lapic is not functional.
| > [    1.009858] Could not switch to high resolution mode on CPU 1
| > [    1.009969]  lapic is not functional.
| > [    1.056944] Could not switch to high resolution mode on CPU 0
| 
| No, that's documented behaviour:
| 
| > > > [    0.126660] APIC timer registered as dummy, due to nmi_watchdog=1!
| 
| Can you try nmi_watchdog=2 ?
| 
| Thanks,
| 
| 	tglx
| 

And get apic=debug a try too please. I remember there
was a problem with SB600 on ACPI side (but they should
be already fixed)

		- Cyrill -
Comment 15 Cyrill Gorcunov 2008-09-16 10:58:33 UTC
[Cyrill Gorcunov - Tue, Sep 16, 2008 at 09:56:29PM +0400]
...
| 
| And get apic=debug a try too please. I remember there
| was a problem with SB600 on ACPI side (but they should
| be already fixed)
| 
| 		- Cyrill -

Sorry Thomas, I meant to send the message to Joshua

		- Cyrill -
Comment 16 Joshua Hoblitt 2008-09-16 19:43:25 UTC
In-Reply-To: <20080916175629.GE7187@lenovo>
On Tue, Sep 16, 2008 at 09:56:29PM +0400, Cyrill Gorcunov wrote:
> [Thomas Gleixner - Tue, Sep 16, 2008 at 07:14:40AM -0700]
> | Can you try nmi_watchdog=2 ?
> | 
> | Thanks,
> | 
> |     tglx
> | 
> 
> And get apic=debug a try too please. I remember there
> was a problem with SB600 on ACPI side (but they should
> be already fixed)

Attached is the updated dmesg from adding the nmi_watchdog=2 and acpi=debug boot params.

-J

--
Comment 17 Cyrill Gorcunov 2008-09-17 00:14:44 UTC
[j_kernel@hoblitt.com - Tue, Sep 16, 2008 at 04:43:12PM -1000]
| In-Reply-To: <20080916175629.GE7187@lenovo>
| On Tue, Sep 16, 2008 at 09:56:29PM +0400, Cyrill Gorcunov wrote:
| > [Thomas Gleixner - Tue, Sep 16, 2008 at 07:14:40AM -0700]
| > | Can you try nmi_watchdog=2 ?
| > | 
| > | Thanks,
| > | 
| > | 	tglx
| > | 
| > 
| > And get apic=debug a try too please. I remember there
| > was a problem with SB600 on ACPI side (but they should
| > be already fixed)
| 
| Attached is the updated dmesg from adding the nmi_watchdog=2 and acpi=debug boot params.
| 
| -J
| 

Joshua, could you please attach ACPI tables dump to the bugzilla entry (~300K)?
Not sure if it help but could be usefull info. Thanks for this dmesg - trying
to analize it.

(
here is how to do it
	http://kernel.org/pub/linux/kernel/people/helgaas/debug
	http://lwn.net/Articles/237085/
)

		- Cyrill -
Comment 18 Joshua Hoblitt 2008-09-17 02:18:06 UTC
Created attachment 17832 [details]
acpi table dump
Comment 19 Joshua Hoblitt 2008-09-17 02:21:35 UTC
On Wed, Sep 17, 2008 at 11:13:57AM +0400, Cyrill Gorcunov wrote:
> Joshua, could you please attach ACPI tables dump to the bugzilla entry
> (~300K)?

Done but I was unable to extract the DSDT table as described in the
instructions.

#  acpixtract DSDT < acpidump.asc           
Could not open DSDT

Cheers,

-J

--
Comment 20 Cyrill Gorcunov 2008-09-17 02:38:16 UTC
[j_kernel@hoblitt.com - Tue, Sep 16, 2008 at 11:20:58PM -1000]
| On Wed, Sep 17, 2008 at 11:13:57AM +0400, Cyrill Gorcunov wrote:
| > Joshua, could you please attach ACPI tables dump to the bugzilla entry (~300K)?
| 
| Done but I was unable to extract the DSDT table as described in the
| instructions.
| 
| #  acpixtract DSDT < acpidump.asc           
| Could not open DSDT
| 
| Cheers,
| 
| -J
| 

 it's enough, thanks!

		- Cyrill -
Comment 21 Cyrill Gorcunov 2008-09-17 05:59:51 UTC
[j_kernel@hoblitt.com - Tue, Sep 16, 2008 at 11:20:58PM -1000]
| On Wed, Sep 17, 2008 at 11:13:57AM +0400, Cyrill Gorcunov wrote:
| > Joshua, could you please attach ACPI tables dump to the bugzilla entry (~300K)?
| 
| Done but I was unable to extract the DSDT table as described in the
| instructions.
| 
| #  acpixtract DSDT < acpidump.asc           
| Could not open DSDT
| 
| Cheers,
| 
| -J
| 
| --
| 

As far as I can see there is no really issue in ACPI configuration,
at least on apic side...

Evetually nmi_watchdog=2 does work right? Though it's not
good that we've got 8259 spurious interrupt. Hmm...

		- Cyrill -
Comment 22 Ingo Molnar 2008-09-17 06:13:31 UTC
> Evetually nmi_watchdog=2 does work right? Though it's not good that 
> we've got 8259 spurious interrupt. Hmm...

well, nmi_watchdog=2 changes the layout of clockevent devices and easily 
switches the system into non-highres non-dynticks mode. So it can hide 
bugs.

	Ingo
Comment 23 Cyrill Gorcunov 2008-09-17 06:29:30 UTC
[Ingo Molnar - Wed, Sep 17, 2008 at 03:13:06PM +0200]
| 
| > Evetually nmi_watchdog=2 does work right? Though it's not good that 
| > we've got 8259 spurious interrupt. Hmm...
| 
| well, nmi_watchdog=2 changes the layout of clockevent devices and easily 
| switches the system into non-highres non-dynticks mode. So it can hide 
| bugs.
| 
| 	Ingo
| 

and it does since we have nmi stuck on lapics. Still investigating.

		- Cyrill -
Comment 24 Cyrill Gorcunov 2008-09-17 09:27:40 UTC
[j_kernel@hoblitt.com - Tue, Sep 16, 2008 at 11:20:58PM -1000]
| On Wed, Sep 17, 2008 at 11:13:57AM +0400, Cyrill Gorcunov wrote:
| > Joshua, could you please attach ACPI tables dump to the bugzilla entry (~300K)?
| 
| Done but I was unable to extract the DSDT table as described in the
| instructions.
| 
| #  acpixtract DSDT < acpidump.asc           
| Could not open DSDT
| 
| Cheers,
| 
| -J
| 
| --
| 

Joshua, could you please do one more thing - boot the kernel
with "debug apic=debug nmi_watchdog=1" (among others) and
publish dmsg?

		- Cyrill -
Comment 25 Thomas Gleixner 2008-09-17 11:14:34 UTC
> > Evetually nmi_watchdog=2 does work right? Though it's not good that 
> > we've got 8259 spurious interrupt. Hmm...
> 
> well, nmi_watchdog=2 changes the layout of clockevent devices and easily 
> switches the system into non-highres non-dynticks mode. So it can hide 
> bugs.

No, nmi_watchdog=1 is the one which goes into non-highres non-dynticks
mode. That's why I asked to use nmi_watchdog=2

Thanks,

	tglx
Comment 26 Joshua Hoblitt 2008-09-17 13:49:33 UTC
On Wed, Sep 17, 2008 at 08:27:04PM +0400, Cyrill Gorcunov wrote:
> Joshua, could you please do one more thing - boot the kernel
> with "debug apic=debug nmi_watchdog=1" (among others) and
> publish dmsg?

dmesg from "debug apic=debug nmi_watchdog=2" is attached.
                                          ^

-J

--
Comment 27 Cyrill Gorcunov 2008-09-18 00:10:23 UTC
[j_kernel@hoblitt.com - Wed, Sep 17, 2008 at 10:48:50AM -1000]
| On Wed, Sep 17, 2008 at 08:27:04PM +0400, Cyrill Gorcunov wrote:
| > Joshua, could you please do one more thing - boot the kernel
| > with "debug apic=debug nmi_watchdog=1" (among others) and
| > publish dmsg?
| 
| dmesg from "debug apic=debug nmi_watchdog=2" is attached.
|                                           ^
| 
| -J
| 
| --
...

No Joshua, exactly 'nmi_watchdog=1' since it's important to
find who is responsible that cpu is stuck while testing nmi
watchdog. 'debug and apic=debug' will show additional info
what is happening. And it would be just great if you able
to try latest -tip/master

	http://people.redhat.com/mingo/tip.git/README

since there a lot of work/fix are done. If it's a problem
to fetch this kernel thru git - me or Ingo (I suppose)
could make tar.bz2 archive so you'll not need to fetch
the whole history. I wouldn't say I'm really specialist
in this area so your report could be quite important
not for me only but for others too.

		- Cyrill -
Comment 28 Rafael J. Wysocki 2008-09-21 12:10:15 UTC
References : http://marc.info/?l=linux-kernel&m=122117786124326&w=4
Handled-By : Thomas Gleixner <tglx@linutronix.de>
Handled-By : Cyrill Gorcunov <gorcunov@gmail.com>
Handled-By : Ingo Molnar <mingo@elte.hu>
Comment 29 Joshua Hoblitt 2008-09-26 20:04:20 UTC
Same issue or is this a separate problem with the watchdog?

ipp018 login: [  717.820766] BUG: NMI Watchdog detected LOCKUP on CPU2, ip ffffffff80211f60, registers:
[  717.820766] CPU 2 
[  717.820766] Modules linked in: w83627hf hwmon_vid autofs4 smsc37b787_wdt k8temp forcedeth i2c_nforce2 i2c_core tg3 libphy e1000 xfs dm_snapshot dm_mirror dm_log aacraid 3w_9xxx 3w_xxxx atp870u arcmsr aic7xxx scsi_wait_scan
[  717.820766] Pid: 0, comm: swapper Not tainted 2.6.27-rc5-22033-gd26acd9-dirty #2
[  717.820766] RIP: 0010:[<ffffffff80211f60>]  [<ffffffff80211f60>] native_read_tsc+0xb/0x18
[  717.820766] RSP: 0018:ffff88012fab7db0  EFLAGS: 00000002
[  717.820766] RAX: 000000007d8e8ab6 RBX: 0000000000000001 RCX: 000000000004a007
[  717.820766] RDX: 0000000000000205 RSI: 0000000000000000 RDI: 0000000000000001
[  717.820766] RBP: 000000007d8e8a5e R08: 0000000000000001 R09: 0000000000000001
[  717.820766] R10: 0000000000000000 R11: ffffffff803bac5e R12: 0000000000000002
[  717.820766] R13: 00000000a7c04084 R14: 0000000000000800 R15: 0000000000000000
[  717.820766] FS:  00000000006f5360(0000) GS:ffff88022fa22300(0000) knlGS:0000000000000000
[  717.820766] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  717.820766] CR2: 00007fbf400750e0 CR3: 000000022dc37000 CR4: 00000000000006e0
[  717.820766] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  717.820766] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  717.820766] Process swapper (pid: 0, threadinfo ffff88022fa5e000, task ffff88012fa83c40)
[  717.820766] Stack:  ffffffff803bac7a ffff880136257100 0000000004b06ebc 0000000000000001
[  717.820766]  ffffffff803be1b7 0000000000000000 ffff880136257100 ffff88012fab7e68
[  717.820766]  0000000000000092 ffff880136257100 ffffffff805c7f70 ffffffff802319b9
[  717.820766] Call Trace:
[  717.820766]  <IRQ>  [<ffffffff803bac7a>] ? delay_tsc+0x1c/0x42
[  717.820766]  [<ffffffff803be1b7>] ? _raw_spin_lock+0xb8/0x120
[  717.820766]  [<ffffffff805c7f70>] ? _spin_lock_irqsave+0x37/0x3f
[  717.820766]  [<ffffffff802319b9>] ? tg_shares_up+0xcf/0x195
[  717.820766]  [<ffffffff802319b9>] ? tg_shares_up+0xcf/0x195
[  717.820766]  [<ffffffff802318ea>] ? tg_shares_up+0x0/0x195
[  717.820766]  [<ffffffff80229658>] ? tg_nop+0x0/0x6
[  717.820766]  [<ffffffff8022ad28>] ? walk_tg_tree+0x90/0xbc
[  717.820766]  [<ffffffff8022ac98>] ? walk_tg_tree+0x0/0xbc
[  717.820766]  [<ffffffff8022e2ed>] ? try_to_wake_up+0xb0/0x238
[  717.820766]  [<ffffffff80249fa8>] ? hrtimer_wakeup+0x0/0x21
[  717.820766]  [<ffffffff80249fc5>] ? hrtimer_wakeup+0x1d/0x21
[  717.820766]  [<ffffffff80249dba>] ? __run_hrtimer+0x55/0x92
[  717.820766]  [<ffffffff8024a7e7>] ? hrtimer_interrupt+0xe3/0x15d
[  717.820766]  [<ffffffff8021c23c>] ? smp_apic_timer_interrupt+0x89/0xa9
[  717.820766]  [<ffffffff8020cbb6>] ? apic_timer_interrupt+0x66/0x70
[  717.820766]  <EOI>  [<ffffffff805ca561>] ? __atomic_notifier_call_chain+0x0/0x83
[  717.820766]  [<ffffffff805ca561>] ? __atomic_notifier_call_chain+0x0/0x83
[  717.820766]  [<ffffffff80212280>] ? default_idle+0x27/0x3b
[  717.820766]  [<ffffffff80212469>] ? c1e_idle+0xd4/0xd8
[  717.820766]  [<ffffffff8020abd1>] ? cpu_idle+0x7b/0xa1
[  717.820766] 
[  717.820766] 
[  717.820766] Code: 8c 1a 00 c3 90 90 90 40 0f b6 c7 e6 70 e4 71 0f b6 c0 c3 40 0f b6 c6 e6 70 40 0f b6 c7 e6 71 c3 0f ae f0 66 66 90 0f 31 0f ae f0 <66> 66 90 48 c1 e2 20 89 c0 48 09 d0 c3 41 57 41 56 41 55 41 54 
[  717.820766] ---[ end trace 32605f607ce8ad50 ]---
[  717.820766] Kernel panic - not syncing: Aiee, killing interrupt handler!
[  717.821183] BUG: NMI Watchdog detected LOCKUP on CPU3, ip ffffffff8024bbf4, registers:
[  717.821183] CPU 3 
[  717.821183] Modules linked in: w83627hf hwmon_vid autofs4 smsc37b787_wdt k8temp forcedeth i2c_nforce2 i2c_core tg3 libphy e1000 xfs dm_snapshot dm_mirror dm_log aacraid 3w_9xxx 3w_xxxx atp870u arcmsr aic7xxx scsi_wait_scan
[  717.821183] Pid: 0, comm: swapper Not tainted 2.6.27-rc5-22033-gd26acd9-dirty #2
[  717.821183] RIP: 0010:[<ffffffff8024bbf4>]  [<ffffffff8024bbf4>] sched_clock_cpu+0x116/0x12b
[  717.821183] RSP: 0018:ffff88022fa93e58  EFLAGS: 00000002
[  717.821183] RAX: 000000000000292a RBX: ffffffff80919a80 RCX: ffffffff8090f478
[  717.821183] RDX: ffff880136253040 RSI: 000000b62fead5f3 RDI: 000000b62fead5f3
[  717.821183] RBP: ffff880136257a80 R08: 0000000000000001 R09: 0000000000010000
[  717.821183] R10: ffff8801b593e000 R11: 0000000000000000 R12: 0000000000000003
[  717.821183] R13: ffffffff80915310 R14: 0000000000000018 R15: 0000000000000003
[  717.821183] FS:  00007fbf407036f0(0000) GS:ffff88022fa22780(0000) knlGS:0000000000000000
[  717.821183] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  717.821183] CR2: 0000000000615198 CR3: 000000020e43a000 CR4: 00000000000006e0
[  717.821183] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  717.821183] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  717.821183] Process swapper (pid: 0, threadinfo ffff88012fadc000, task ffff88022fa88000)
[  717.821183] Stack:  0000000000000096 0000000000000000 ffffffff80915318 ffffffff8024bd13
[  717.821183]  000000000000029c ffffffff8027233f 000000010001e9d8 ffffffff8027247a
[  717.821183]  ffff88012fadde78 0000000000000003 0000000000000000 ffff88022fa88000
[  717.821183] Call Trace:
[  717.821183]  <IRQ>  [<ffffffff8024bd13>] ? cpu_clock+0x9/0xe
[  717.821183]  [<ffffffff8027233f>] ? get_timestamp+0x9/0xf
[  717.821183]  [<ffffffff8027247a>] ? softlockup_tick+0xc7/0x1ab
[  717.821183]  [<ffffffff8023de88>] ? update_process_times+0x26/0x4b
[  717.821183]  [<ffffffff80250535>] ? tick_sched_timer+0x76/0xa7
[  717.821183]  [<ffffffff802504bf>] ? tick_sched_timer+0x0/0xa7
[  717.821183]  [<ffffffff80249dba>] ? __run_hrtimer+0x55/0x92
[  717.821183]  [<ffffffff8024a7e7>] ? hrtimer_interrupt+0xe3/0x15d
[  717.821183]  [<ffffffff8021c23c>] ? smp_apic_timer_interrupt+0x89/0xa9
[  717.821183]  [<ffffffff8020cbb6>] ? apic_timer_interrupt+0x66/0x70
[  717.821183]  <EOI>  [<ffffffff805ca561>] ? __atomic_notifier_call_chain+0x0/0x83
[  717.821183]  [<ffffffff805ca561>] ? __atomic_notifier_call_chain+0x0/0x83
[  717.821183]  [<ffffffff80212280>] ? default_idle+0x27/0x3b
[  717.821183]  [<ffffffff80212469>] ? c1e_idle+0xd4/0xd8
[  717.821183]  [<ffffffff8020abd1>] ? cpu_idle+0x7b/0xa1
[  717.821183] 
[  717.821183] 
[  717.821183] Code: 48 8b 45 18 48 39 d0 79 09 48 89 d0 48 89 55 18 eb 04 48 89 43 18 fe 03 eb 1e b8 00 01 00 00 f0 66 0f c1 45 00 38 e0 74 07 f3 90 <8a> 45 00 eb f5 48 89 ef e8 5a fe ff ff fe 45 00 5b 5d 41 5c c3 
[  717.821183] ---[ end trace 32605f607ce8ad50 ]---
Comment 30 Thomas Gleixner 2008-09-27 02:56:15 UTC
> Same issue or is this a separate problem with the watchdog?

Hmm, not sure. Does the system lockup when you disable nmi_watchdog ?
We want to be sure that this is not some watchdog weirdness.

Thanks,

	tglx
Comment 31 Cyrill Gorcunov 2008-09-28 12:16:18 UTC
oh...someone removed me from CC list, fixed. Joshua I don't see from this tread - did you tried to play with "idle=" boot param? Possible options are: poll,mwait,halt,nomwait. Wonder if it helps. Thomas how do you think?
Comment 32 Joshua Hoblitt 2008-09-28 12:39:36 UTC
> ------- Comment #31 from gorcunov@gmail.com  2008-09-28 12:16 -------
> oh...someone removed me from CC list, fixed. Joshua I don't see from this
> tread
> - did you tried to play with "idle=" boot param? Possible options are:
> poll,mwait,halt,nomwait. Wonder if it helps. Thomas how do you think?

Nope.  I've not fiddled with idle.

-J

--
Comment 33 Cyrill Gorcunov 2008-09-29 09:31:21 UTC
Could you try please? Not sure if it help but I'm really messed now with how we get stuck in NMI watchdog in your case loosing any interrupt activity. It should not eat too much time of you to test it I hope.
Comment 34 Joshua Hoblitt 2008-09-29 12:57:56 UTC
> ------- Comment #33 from gorcunov@gmail.com  2008-09-29 09:31 -------
> Could you try please? Not sure if it help but I'm really messed now with how
> we
> get stuck in NMI watchdog in your case loosing any interrupt activity. It
> should not eat too much time of you to test it I hope.

What is my best bet? idle=poll ?
Comment 35 Cyrill Gorcunov 2008-09-29 13:24:55 UTC
Why not :) And give nomwait a chance too.
Comment 36 Joshua Hoblitt 2008-09-29 13:33:39 UTC
> ------- Comment #35 from gorcunov@gmail.com  2008-09-29 13:24 -------
> Why not :) And give nomwait a chance too.

How do you find out what mode the kernel has decided to use on it's own?
Will nomwait screw with dynticks?

-J

--
Comment 37 Joshua Hoblitt 2008-09-29 14:28:13 UTC
Argh, got another oops.  I'm going to try disabling the watchdog as Thomas suggested.

<Sep/29 10:01 am>/dev/sda3: 39561[   92.023748] ------------[ cut here ]------------
<Sep/29 10:01 am>7/4006240 files [   92.034528] EXT3 FS on sda3, internal journal
<Sep/29 10:01 am>(1.0% non-contig[   92.023748] WARNING: at kernel/mutex.c:351 mutex_trylock+0x52/0x111()
<Sep/29 10:01 am>uous), 1620981/8[   92.023748] Modules linked in:000000 blocks

<Sep/29 10:01 am>  k8temp                 forcedeth                 i2c_nforce2          [ ok ] i2c_core
<Sep/29 10:01 am> * Remounting  tg3root filesystem  libphyread/write ...   e1000                 xfs            [ ok dm_snapshot ]
<Sep/29 10:01 am> dm_mirror dm_log aacraid 3w_9xxx 3w_xxxx atp870u arcmsr aic7xxx scsi_wait_scan
<Sep/29 10:01 am>[   92.023748] Pid: 0, comm: swapper Not tainted 2.6.27-rc5-22033-gd26acd9-dirty #2
<Sep/29 10:01 am>[   92.023748] 
<Sep/29 10:01 am>[   92.023748] Call Trace:
<Sep/29 10:01 am>[   92.023748]  <IRQ>  [<ffffffff80235361>] warn_on_slowpath+0x51/0x77
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80235816>] release_console_sem+0x3e/0x1a1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff805c63de>] mutex_trylock+0x52/0x111
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80235ef4>] printk+0x4e/0x56
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8025f3cf>] crash_kexec+0x17/0xef
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8020abd1>] cpu_idle+0x7b/0xa1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80235816>] release_console_sem+0x3e/0x1a1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8020abd1>] cpu_idle+0x7b/0xa1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8020abd1>] cpu_idle+0x7b/0xa1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8040aa5a>] do_unblank_screen+0xd/0x10c
<Sep/29 10:01 am>[   92.023748]  [<ffffffff803bbab9>] bust_spinlocks+0x15/0x30
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80235260>] panic+0x8f/0x13f
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80235816>] release_console_sem+0x3e/0x1a1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80235816>] release_console_sem+0x3e/0x1a1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80272551>] softlockup_tick+0x19e/0x1ab
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8023de88>] update_process_times+0x26/0x4b

<Sep/29 10:01 am>[   92.023748]  [<ffffffff80250535>] tick_sched_timer+0x76/0xa7
<Sep/29 10:01 am>[   92.023748]  [<ffffffff802504bf>] tick_sched_timer+0x0/0xa7
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80249dba>] __run_hrtimer+0x55/0x92
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8024a7e7>] hrtimer_interrupt+0xe3/0x15d
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8021c23c>] smp_apic_timer_interrupt+0x89/0xa9
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8020cbb6>] apic_timer_interrupt+0x66/0x70
<Sep/29 10:01 am>[   92.023748]  <EOI>  [<ffffffff80212395>] c1e_idle+0x0/0xd8
<Sep/29 10:01 am>[   92.023748]  [<ffffffff805ca561>] __atomic_notifier_call_chain+0x0/0x83
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80212280>] default_idle+0x27/0x3b
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80212469>] c1e_idle+0xd4/0xd8
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8020abd1>] cpu_idle+0x7b/0xa1
<Sep/29 10:01 am>[   92.023748] 
<Sep/29 10:01 am>[   92.023748] ---[ end trace 16fac595db54a2dd ]---
<Sep/29 10:01 am> * Using /etc/mo[   92.023748] ------------[ cut here ]------------
<Sep/29 10:01 am>dules.autoload.d[   92.023748] WARNING: at kernel/smp.c:332 smp_call_function_mask+0x37/0x1d6()
<Sep/29 10:01 am>/kernel-2.6 as c[   92.023748] Modules linked in:onfig:
<Sep/29 10:01 am> *   Loa k8tempding module smsc forcedeth37b787_wdt ... i2c_nforce2 i2c_core tg3 libphy e1000 xfs dm_snapshot dm_mirror dm_log aacraid 3w_9xxx 3w_xxxx atp870u arcmsr aic7xxx scsi_wait_scan
<Sep/29 10:01 am>[   92.023748] Pid: 0, comm: swapper Tainted: G        W 2.6.27-rc5-22033-gd26acd9-dirty #2
<Sep/29 10:01 am>[   92.023748] 
<Sep/29 10:01 am>[   92.023748] Call Trace:
<Sep/29 10:01 am>[   92.023748]  <IRQ>  [<ffffffff80235361>] warn_on_slowpath+0x51/0x77
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80234f85>] print_oops_end_marker+0x9/0x20

<Sep/29 10:01 am>[   92.023748]  [<ffffffff80235366>] warn_on_slowpath+0x56/0x77
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80256c53>] smp_call_function_mask+0x37/0x1d6
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8025f3cf>] crash_kexec+0x17/0xef
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8025f49e>] crash_kexec+0xe6/0xef
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8020abd1>] cpu_idle+0x7b/0xa1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80235816>] release_console_sem+0x3e/0x1a1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8020abd1>] cpu_idle+0x7b/0xa1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8020abd1>] cpu_idle+0x7b/0xa1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8021b596>] native_smp_send_stop+0x1a/0x26
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80235266>] panic+0x95/0x13f
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80235816>] release_console_sem+0x3e/0x1a1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80235816>] release_console_sem+0x3e/0x1a1
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80272551>] softlockup_tick+0x19e/0x1ab
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8023de88>] update_process_times+0x26/0x4b
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80250535>] tick_sched_timer+0x76/0xa7
<Sep/29 10:01 am>[   92.023748]  [<ffffffff802504bf>] tick_sched_timer+0x0/0xa7
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80249dba>] __run_hrtimer+0x55/0x92
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8024a7e7>] hrtimer_interrupt+0xe3/0x15d
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8021c23c>] smp_apic_timer_interrupt+0x89/0xa9
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8020cbb6>] apic_timer_interrupt+0x66/0x70
<Sep/29 10:01 am>[   92.023748]  <EOI>  [<ffffffff80212395>] c1e_idle+0x0/0xd8
<Sep/29 10:01 am>[   92.023748]  [<ffffffff805ca561>] __atomic_notifier_call_chain+0x0/0x83

<Sep/29 10:01 am>[   92.023748]  [<ffffffff80212280>] default_idle+0x27/0x3b
<Sep/29 10:01 am>[   92.023748]  [<ffffffff80212469>] c1e_idle+0xd4/0xd8
<Sep/29 10:01 am>[   92.023748]  [<ffffffff8020abd1>] cpu_idle+0x7b/0xa1
<Sep/29 10:01 am>[   92.023748] 
<Sep/29 10:01 am>[   92.023748] ---[ end trace 16fac595db54a2dd ]---
Comment 38 Joshua Hoblitt 2008-09-29 18:50:11 UTC
It happened again without nmi_watchdog set.

  Booting 'linux 2.6.27-rc5 (netdev) genkerne[    0.000000] Initializing cgroup subsys cpuset                                                                   
[    0.000000] Linux version 2.6.27-rc5-22033-gd26acd9-dirty (root@ipp000) (gcc version 4.1.2 (Gentoo 4.1.2 p1.1)) #2 SMP Fri Sep 12 10:04:18 HST 2008dirty roo 
[    0.000000] Command line: root=/dev/ram0 real_root=/dev/sda3 init=/linuxrc show_msr=1 apic=debug console=tty0 console=ttyS0,115200n8                         
[    0.000000] KERNEL supported cpus:=0x2f6e10]                                 
[    0.000000]   Intel GenuineIntelrnel-x86_64-2.6.27-rc5-22033-gd26acd9-dirty  
[    0.000000]   AMD AuthenticAMD1e0850 bytes]                                  
[    0.000000]   Centaur CentaurHauls                                           
[    0.000000] BIOS-provided physical RAM map:                                  
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009dc00 (usable)         
[    0.000000]  BIOS-e820: 000000000009dc00 - 00000000000a0000 (reserved)       
[    0.000000]  BIOS-e820: 00000000000ce000 - 0000000000100000 (reserved)       
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)         
[    0.000000]  BIOS-e820: 00000000cff60000 - 00000000cff6a000 (ACPI data)      
[    0.000000]  BIOS-e820: 00000000cff6a000 - 00000000cff80000 (ACPI NVS)       
[    0.000000]  BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)       
[    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)       
[    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)       
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)       
[    0.000000]  BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)       
[    0.000000]  BIOS-e820: 0000000100000000 - 0000000230000000 (usable)         
[    0.000000] last_pfn = 0x230000 max_arch_pfn = 0x3ffffffff                   
[    0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 
[    0.000000] last_pfn = 0xcff60 max_arch_pfn = 0x3ffffffff
[    0.000000] init_memory_mapping
[    0.000000] last_map_addr: cff60000 end: cff60000
[    0.000000] init_memory_mapping
[    0.000000] last_map_addr: 230000000 end: 230000000
[    0.000000] RAMDISK: 37e0f000 - 37fef850
[    0.000000] DMI present.
[    0.000000] ACPI: RSDP 000F75D0, 0024 (r2 PTLTD )
[    0.000000] ACPI: XSDT CFF656FA, 0064 (r1 PTLTD       XSDT    6040000  LTP        0)
[    0.000000] ACPI: FACP CFF6970E, 00F4 (r3 NVIDIA CK8S      6040000 PTL_    F4240)
[    0.000000] ACPI: DSDT CFF6575E, 3F3C (r1 NVIDIA      CK8  6040000 MSFT  3000000)
[    0.000000] ACPI: FACS CFF6AFC0, 0040
[    0.000000] ACPI: SSDT CFF69802, 0574 (r1 AMD    POWERNOW  6040000 AMD         1)
[    0.000000] ACPI: SRAT CFF69D76, 0110 (r1 AMD    HAMMER    6040000 AMD         1)
[    0.000000] ACPI: SPCR CFF69E86, 0050 (r1 PTLTD  $UCRTBL$  6040000 PTL         1)
[    0.000000] ACPI: MCFG CFF69ED6, 003C (r1 PTLTD    MCFG    6040000  LTP        0)
[    0.000000] ACPI: HPET CFF69F12, 0038 (r1 PTLTD  HPETTBL   6040000  LTP        1)
[    0.000000] ACPI: APIC CFF69F4A, 008E (r1 PTLTD       APIC    6040000  LTP        0)
[    0.000000] ACPI: BOOT CFF69FD8, 0028 (r1 PTLTD  $SBFTBL$  6040000  LTP        1)
[    0.000000] SRAT: PXM 0 -> APIC 0 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 1 -> Node 0
[    0.000000] SRAT: PXM 1 -> APIC 2 -> Node 1
[    0.000000] SRAT: PXM 1 -> APIC 3 -> Node 1
[    0.000000] SRAT: Node 0 PXM 0 0-a0000
[    0.000000] SRAT: Node 0 PXM 0 100000-d0000000
[    0.000000] SRAT: Node 0 PXM 0 100000000-130000000
[    0.000000] SRAT: Node 1 PXM 1 130000000-230000000
[    0.000000] Bootmem setup node 0 0000000000000000-0000000130000000
[    0.000000]   NODE_DATA [0000000000015680 - 000000000002067f]
[    0.000000]   bootmap [0000000000021000 -  0000000000046fff] pages 26
[    0.000000] (8 early reservations) ==> bootmem [0000000000 - 0130000000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
[    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
[    0.000000]   #2 [0000200000 - 0000f96910]    TEXT DATA BSS ==> [0000200000 - 0000f96910]
[    0.000000]   #3 [0037e0f000 - 0037fef850]          RAMDISK ==> [0037e0f000 - 0037fef850]
[    0.000000]   #4 [000009dc00 - 0000100000]    BIOS reserved ==> [000009dc00 - 0000100000]
[    0.000000]   #5 [0000008000 - 000000c000]          PGTABLE ==> [0000008000 - 000000c000]
[    0.000000]   #6 [000000c000 - 0000011000]          PGTABLE ==> [000000c000 - 0000011000]
[    0.000000]   #7 [0000011000 - 0000015680]       MEMNODEMAP ==> [0000011000 - 0000015680]
[    0.000000] Bootmem setup node 1 0000000130000000-0000000230000000
[    0.000000]   NODE_DATA [0000000130000000 - 000000013000afff]
[    0.000000]   bootmap [000000013000b000 -  000000013002afff] pages 20
[    0.000000] (8 early reservations) ==> bootmem [0130000000 - 0230000000]
[    0.000000]   #0 [0000000000 - 0000001000]   BIOS data page
[    0.000000]   #1 [0000006000 - 0000008000]       TRAMPOLINE
[    0.000000]   #2 [0000200000 - 0000f96910]    TEXT DATA BSS
[    0.000000]   #3 [0037e0f000 - 0037fef850]          RAMDISK
[    0.000000]   #4 [000009dc00 - 0000100000]    BIOS reserved
[    0.000000]   #5 [0000008000 - 000000c000]          PGTABLE
[    0.000000]   #6 [000000c000 - 0000011000]          PGTABLE
[    0.000000]   #7 [0000011000 - 0000015680]       MEMNODEMAP
[    0.000000] Scan SMP from ffff880000000000 for 1024 bytes.
[    0.000000] Scan SMP from ffff88000009fc00 for 1024 bytes.
[    0.000000] Scan SMP from ffff8800000f0000 for 65536 bytes.
[    0.000000] found SMP MP-table at [ffff8800000f7600] 000f7600
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA      0x00000000 -> 0x00001000
[    0.000000]   DMA32    0x00001000 -> 0x00100000
[    0.000000]   Normal   0x00100000 -> 0x00230000
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[4] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x0000009d
[    0.000000]     0: 0x00000100 -> 0x000cff60
[    0.000000]     0: 0x00100000 -> 0x00130000
[    0.000000]     1: 0x00130000 -> 0x00230000
[    0.000000] Detected use of extended apic ids on hypertransport bus
[    0.000000] Detected use of extended apic ids on hypertransport bus
[    0.000000] ACPI: PM-Timer IO Port: 0x1008
[    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
[    0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 4, version 0, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] Setting APIC routing to flat
[    0.000000] ACPI: HPET id: 0x10de8201 base: 0xfed00000
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] SMP: Allowing 4 CPUs, 0 hotplug CPUs
[    0.000000] mapped APIC to ffffffffff5fc000 (        fee00000)
[    0.000000] mapped IOAPIC to ffffffffff5fb000 (00000000fec00000)
[    0.000000] PM: Registered nosave memory: 000000000009d000 - 000000000009e000
[    0.000000] PM: Registered nosave memory: 000000000009e000 - 00000000000a0000
[    0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000ce000
[    0.000000] PM: Registered nosave memory: 00000000000ce000 - 0000000000100000
[    0.000000] PM: Registered nosave memory: 00000000cff60000 - 00000000cff6a000
[    0.000000] PM: Registered nosave memory: 00000000cff6a000 - 00000000cff80000
[    0.000000] PM: Registered nosave memory: 00000000cff80000 - 00000000d0000000
[    0.000000] PM: Registered nosave memory: 00000000d0000000 - 00000000e0000000
[    0.000000] PM: Registered nosave memory: 00000000e0000000 - 00000000f0000000
[    0.000000] PM: Registered nosave memory: 00000000f0000000 - 00000000fec00000
[    0.000000] PM: Registered nosave memory: 00000000fec00000 - 00000000fec10000
[    0.000000] PM: Registered nosave memory: 00000000fec10000 - 00000000fee00000
[    0.000000] PM: Registered nosave memory: 00000000fee00000 - 00000000fee01000
[    0.000000] PM: Registered nosave memory: 00000000fee01000 - 00000000fff80000
[    0.000000] PM: Registered nosave memory: 00000000fff80000 - 0000000100000000
[    0.000000] Allocating PCI resources starting at d1000000 (gap: d0000000:10000000)
[    0.000000] PERCPU: Allocating 51872 bytes of per cpu data
[    0.000000] Built 2 zonelists in Node order, mobility grouping on.  Total pages: 2039539
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: root=/dev/ram0 real_root=/dev/sda3 init=/linuxrc show_msr=1 apic=debug console=tty0 console=ttyS0,115200n8
[    0.000000] Initializing CPU#0
[    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[    0.000000] Extended CMOS year: 2000
[    0.000000] TSC: PIT calibration confirmed by PMTIMER.
[    0.000000] TSC: using PIT calibration value
[    0.000000] Detected 2799.937 MHz processor.
[    0.003333] Console: colour VGA+ 80x25
[    0.003333] console [tty0] enabled
[    0.003333] console [ttyS0] enabled
[    0.003333] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[    0.003333] ... MAX_LOCKDEP_SUBCLASSES:    8
[    0.003333] ... MAX_LOCK_DEPTH:          48
[    0.003333] ... MAX_LOCKDEP_KEYS:        8191
[    0.003333] ... CLASSHASH_SIZE:           4096
[    0.003333] ... MAX_LOCKDEP_ENTRIES:     8192
[    0.003333] ... MAX_LOCKDEP_CHAINS:      16384
[    0.003333] ... CHAINHASH_SIZE:          8192
[    0.003333]  memory used by lock dependency info: 3839 kB
[    0.003333]  per task-struct memory footprint: 1920 bytes
[    0.003333] Checking aperture...
[    0.003333] No AGP bridge found
[    0.003333] Node 0: aperture @ 0 size 32 MB
[    0.003333] Your BIOS doesn't leave a aperture memory hole
[    0.003333] Please enable the IOMMU option in the BIOS setup
[    0.003333] This costs you 64 MB of RAM
[    0.003333] Mapping aperture over 65536 KB of RAM @ 20000000
[    0.003333] PM: Registered nosave memory: 0000000020000000 - 0000000024000000
[    0.003333] Memory: 8108304k/9175040k available (3897k kernel code, 279268k reserved, 2242k data, 676k init)
[    0.003333] SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=2
[    0.003333] Calibrating delay loop (skipped), value calculated using timer frequency.. 5602.20 BogoMIPS (lpj=9333123)
[    0.006694] Security Framework initialized
[    0.010740] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.019174] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.025693] Mount-cache hash table entries: 256
[    0.026995] Initializing cgroup subsys ns
[    0.030004] Initializing cgroup subsys cpuacct
[    0.033335] Initializing cgroup subsys memory
[    0.036675] Initializing cgroup subsys devices
[    0.040006] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.043332] CPU: L2 Cache: 1024K (64 bytes/line)
[    0.046665] CPU 0/0 -> Node 0
[    0.050000] CPU: Physical Processor ID: 0
[    0.053331] CPU: Processor Core ID: 0
[    0.056670] using C1E aware idle routine
[    0.060022] ACPI: Core revision 20080609
[    0.070057] Getting VERSION: 80050010
[    0.073331] Getting VERSION: 80050010
[    0.076663] Getting ID: 0
[    0.079463] Getting ID: ff000000
[    0.079994] Getting LVT0: 10000
[    0.079997] Getting LVT1: 10000
[    0.083331] masked ExtINT on CPU#0
[    0.086906] ENABLING IO-APIC IRQs
[    0.090269] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.128392] CPU0: Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
[    0.134743] Using local APIC timer interrupts.
[    0.139993] Detected 12.499 MHz APIC timer.
[    0.143481] lockdep: fixing up alternatives.
[    0.146681] Booting processor 1/1 ip 6000
[    0.003333] Initializing CPU#1
[    0.003333] masked ExtINT on CPU#1
[    0.003333] Calibrating delay using timer specific routine.. 5602.22 BogoMIPS (lpj=9333160)
[    0.003333] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.003333] CPU: L2 Cache: 1024K (64 bytes/line)
[    0.003333] CPU 1/1 -> Node 0
[    0.003333] CPU: Physical Processor ID: 0
[    0.003333] CPU: Processor Core ID: 1
[    0.003333] x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
[    0.243706] CPU1: Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
[    0.253359] lockdep: fixing up alternatives.
[    0.256663] Booting processor 2/2 ip 6000
[    0.003333] Initializing CPU#2
[    0.003333] masked ExtINT on CPU#2
[    0.003333] Calibrating delay using timer specific routine.. 5602.27 BogoMIPS (lpj=9333232)
[    0.003333] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.003333] CPU: L2 Cache: 1024K (64 bytes/line)
[    0.003333] CPU 2/2 -> Node 1
[    0.003333] CPU: Physical Processor ID: 1
[    0.003333] CPU: Processor Core ID: 0
[    0.003333] x86 PAT enabled: cpu 2, old 0x7040600070406, new 0x7010600070106
[    0.353880] CPU2: Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
[    0.363384] lockdep: fixing up alternatives.
[    0.366663] Booting processor 3/3 ip 6000
[    0.003333] Initializing CPU#3
[    0.003333] masked ExtINT on CPU#3
[    0.003333] Calibrating delay using timer specific routine.. 5602.26 BogoMIPS (lpj=9333225)
[    0.003333] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[    0.003333] CPU: L2 Cache: 1024K (64 bytes/line)
[    0.003333] CPU 3/3 -> Node 1
[    0.003333] CPU: Physical Processor ID: 1
[    0.003333] CPU: Processor Core ID: 1
[    0.003333] x86 PAT enabled: cpu 3, old 0x7040600070406, new 0x7010600070106
[    0.463866] CPU3: Dual-Core AMD Opteron(tm) Processor 2220 stepping 03
[    0.473316] Brought up 4 CPUs
[    0.476464] Total of 4 processors activated (22408.96 BogoMIPS).
[    0.480095] net_namespace: 968 bytes
[    0.483490] xor: automatically using best checksumming function: generic_sse
[    0.503303]    generic_sse:  8630.400 MB/sec
[    0.506635] xor: using function: generic_sse (8630.400 MB/sec)
[    0.510036] NET: Registered protocol family 16
[    0.513587] No dock devices found.
[    0.516836] TOM: 00000000d0000000 aka 3328M
[    0.519976] TOM2: 0000000230000000 aka 8960M
[    0.523324] ACPI: bus type pci registered
[    0.526780] PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 4
[    0.529969] PCI: MCFG area at e0000000 reserved in E820
[    0.533407] PCI: Using MMCONFIG at e0000000 - e04fffff
[    0.536635] PCI: Using configuration type 1 for base access
[    0.551716] ACPI: Interpreter enabled
[    0.553302] ACPI: (supports S0 S1 S3 S4 S5)
[    0.559969] ACPI: Using IOAPIC for interrupt routing
[    0.573012] ACPI: PCI Root Bridge [PCI0] (0000:00)
[    0.573607] pci 0000:00:01.1: PME# supported from D3hot D3cold
[    0.576639] pci 0000:00:01.1: PME# disabled
[    0.580011] pci 0000:00:02.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.583303] pci 0000:00:02.0: PME# disabled
[    0.589979] pci 0000:00:02.1: PME# supported from D0 D1 D2 D3hot D3cold
[    0.593303] pci 0000:00:02.1: PME# disabled
[    0.596867] pci 0000:00:08.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.599971] pci 0000:00:08.0: PME# disabled
[    0.603353] pci 0000:00:09.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.606637] pci 0000:00:09.0: PME# disabled
[    0.610001] pci 0000:00:0a.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.613303] pci 0000:00:0a.0: PME# disabled
[    0.616665] pci 0000:00:0d.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.619969] pci 0000:00:0d.0: PME# disabled
[    0.623331] pci 0000:00:0f.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.626636] pci 0000:00:0f.0: PME# disabled
[    0.630232] pci 0000:00:06.0: transparent bridge
[    0.633303] PCI: bridge 0000:00:06.0 io port: [3000, 3fff]
[    0.636636] PCI: bridge 0000:00:06.0 32bit mmio: [d0100000, d01fffff]
[    0.639969] PCI: bridge 0000:00:06.0 32bit mmio pref: [d8000000, dfffffff]
[    0.643399] PCI: bridge 0000:00:0a.0 32bit mmio: [d0200000, d02fffff]
[    0.672620] ACPI: PCI Interrupt Link [LNK1] (IRQs 5 7 10 11 14 15 16 17 18 19 20 21 22 23) *0, disabled.
[    0.680119] ACPI: PCI Interrupt Link [LNK2] (IRQs 5 7 *10 11 14 15 16 17 18 19 20 21 22 23)
[    0.689632] ACPI: PCI Interrupt Link [LNK3] (IRQs 5 7 10 11 14 15 16 17 18 19 20 21 22 23) *0, disabled.
[    0.697156] ACPI: PCI Interrupt Link [LNK4] (IRQs 5 7 10 11 14 15 16 17 18 19 20 21 22 23) *0, disabled.
[    0.708161] ACPI: PCI Interrupt Link [LK1E] (IRQs 5 7 10 11 14 15 16 17 18 19 20 21 22 23) *0, disabled.
[    0.719233] ACPI: PCI Interrupt Link [LK2E] (IRQs 5 7 10 *11 14 15 16 17 18 19 20 21 22 23)
[    0.725213] ACPI: PCI Interrupt Link [LK3E] (IRQs 5 7 10 11 14 15 16 17 18 19 20 21 22 23) *0, disabled.
[    0.732884] ACPI: PCI Interrupt Link [LK4E] (IRQs 5 7 10 11 14 15 16 17 18 19 20 21 22 23) *0, disabled.
[    0.740492] ACPI: PCI Interrupt Link [LSMB] (IRQs 5 7 10 11 14 15 16 17 18 19 20 21 22 23) *0, disabled.
[    0.751495] ACPI: PCI Interrupt Link [LUS0] (IRQs 5 7 *10 11 14 15 16 17 18 19 20 21 22 23)
[    0.761253] ACPI: PCI Interrupt Link [LMA2] (IRQs 5 7 10 *11 14 15 16 17 18 19 20 21 22 23)
[    0.770941] ACPI: PCI Interrupt Link [LUS2] (IRQs *5 7 10 11 14 15 16 17 18 19 20 21 22 23)
[    0.780630] ACPI: PCI Interrupt Link [LMAC] (IRQs 5 7 *10 11 14 15 16 17 18 19 20 21 22 23)
[    0.790117] ACPI: PCI Interrupt Link [LAZA] (IRQs 5 7 10 11 14 15 16 17 18 19 20 21 22 23) *0, disabled.
[    0.800492] ACPI: PCI Interrupt Link [LPID] (IRQs 5 7 10 11 14 15 16 17 18 19 20 21 22 23) *0, disabled.
[    0.811945] ACPI: PCI Interrupt Link [LTID] (IRQs 5 7 10 *11 14 15 16 17 18 19 20 21 22 23)
[    0.818230] ACPI: PCI Interrupt Link [LSI1] (IRQs 5 7 *10 11 14 15 16 17 18 19 20 21 22 23)
[    0.827922] ACPI: PCI Interrupt Link [LSI2] (IRQs 5 7 10 *11 14 15 16 17 18 19 20 21 22 23)
[    0.837749] Linux Plug and Play Support v0.97 (c) Adam Belay
[    0.840034] pnp: PnP ACPI init
[    0.843313] ACPI: bus type pnp registered
[    0.852025] pnp: PnP ACPI: found 14 devices
[    0.853303] ACPI: ACPI bus type pnp unregistered
[    0.856900] SCSI subsystem initialized
[    0.860410] usbcore: registered new interface driver usbfs
[    0.863368] usbcore: registered new interface driver hub
[    0.866740] usbcore: registered new device driver usb
[    0.870337] PCI: Using ACPI for IRQ routing
[    0.873307] testing the IO APIC.......................
[    0.876640] 
[    0.880152] .................................... done.
[    0.896897] PCI-DMA: Disabling AGP.
[    0.900860] PCI-DMA: aperture base @ 20000000 size 65536 KB
[    0.903302] PCI-DMA: using GART IOMMU.
[    0.906637] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
[    0.910474] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31
[    0.917011] hpet0: 3 32-bit timers, 25000000 Hz
[    0.921064] ACPI: RTC can wake from S4
[    0.933369] system 00:00: iomem range 0xffc00000-0xffffffff could not be reserved
[    0.941202] system 00:00: iomem range 0xfec00000-0xfec00fff could not be reserved
[    0.949032] system 00:00: iomem range 0xfee00000-0xfeefffff could not be reserved
[    0.956860] system 00:00: iomem range 0xfed00000-0xfed00fff has been reserved
[    0.964206] system 00:03: iomem range 0xd0000000-0xd0007fff has been reserved
[    0.971543] system 00:04: ioport range 0x1000-0x107f has been reserved
[    0.978269] system 00:04: ioport range 0x1080-0x10ff has been reserved
[    0.984993] system 00:04: ioport range 0x1400-0x147f has been reserved
[    0.991718] system 00:04: ioport range 0x1480-0x14ff has been reserved
[    0.998442] system 00:04: ioport range 0x1800-0x187f has been reserved
[    1.005175] system 00:04: ioport range 0x1880-0x18ff has been reserved
[    1.011904] system 00:04: ioport range 0x2440-0x247f has been reserved
[    1.018629] system 00:04: ioport range 0x2400-0x243f has been reserved
[    1.025360] system 00:07: ioport range 0x4d0-0x4d1 has been reserved
[    1.031914] system 00:07: ioport range 0xc05-0xc06 has been reserved
[    1.039645] pci 0000:00:06.0: PCI bridge, secondary bus 0000:01
[    1.045762] pci 0000:00:06.0:   IO window: 0x3000-0x3fff
[    1.051273] pci 0000:00:06.0:   MEM window: 0xd0100000-0xd01fffff
[    1.057572] pci 0000:00:06.0:   PREFETCH window: 0x000000d8000000-0x000000dfffffff
[    1.065488] pci 0000:00:0a.0: PCI bridge, secondary bus 0000:02
[    1.071606] pci 0000:00:0a.0:   IO window: disabled
[    1.076683] pci 0000:00:0a.0:   MEM window: 0xd0200000-0xd02fffff
[    1.082969] pci 0000:00:0a.0:   PREFETCH window: 0x000000d1000000-0x000000d10fffff
[    1.090895] pci 0000:00:0d.0: PCI bridge, secondary bus 0000:03
[    1.097008] pci 0000:00:0d.0:   IO window: disabled
[    1.102085] pci 0000:00:0d.0:   MEM window: disabled
[    1.107250] pci 0000:00:0d.0:   PREFETCH window: disabled
[    1.112843] pci 0000:00:0f.0: PCI bridge, secondary bus 0000:04
[    1.118959] pci 0000:00:0f.0:   IO window: disabled
[    1.124028] pci 0000:00:0f.0:   MEM window: disabled
[    1.129185] pci 0000:00:0f.0:   PREFETCH window: disabled
[    1.134800] bus: 00 index 0 io port: [0, ffff]
[    1.139437] bus: 00 index 1 mmio: [0, ffffffffffffffff]
[    1.144853] bus: 01 index 0 io port: [3000, 3fff]
[    1.149749] bus: 01 index 1 mmio: [d0100000, d01fffff]
[    1.155080] bus: 01 index 2 mmio: [d8000000, dfffffff]
[    1.165290] bus: 01 index 3 io port: [0, ffff]
[    1.169925] bus: 01 index 4 mmio: [0, ffffffffffffffff]
[    1.175343] bus: 02 index 0 mmio: [0, 0]
[    1.179459] bus: 02 index 1 mmio: [d0200000, d02fffff]
[    1.184789] bus: 02 index 2 mmio: [d1000000, d10fffff]
[    1.190122] bus: 02 index 3 mmio: [0, 0]
[    1.194249] bus: 03 index 0 mmio: [0, 0]
[    1.198369] bus: 03 index 1 mmio: [0, 0]
[    1.202486] bus: 03 index 2 mmio: [0, 0]
[    1.206602] bus: 03 index 3 mmio: [0, 0]
[    1.210720] bus: 04 index 0 mmio: [0, 0]
[    1.214836] bus: 04 index 1 mmio: [0, 0]
[    1.218951] bus: 04 index 2 mmio: [0, 0]
[    1.223067] bus: 04 index 3 mmio: [0, 0]
[    1.227203] NET: Registered protocol family 2
[    1.263618] IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
[    1.273549] TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
[    1.286620] TCP bind hash table entries: 65536 (order: 9, 3670016 bytes)
[    1.296287] TCP: Hash tables configured (established 524288 bind 65536)
[    1.303127] TCP reno registered
[    1.313494] NET: Registered protocol family 1
[    1.318167] checking if image is initramfs... it is
[    1.418688] Freeing initrd memory: 1922k freed
[    1.424357] Simple Boot Flag at 0x37 set to 0x80
[    1.432006] audit: initializing netlink socket (disabled)
[    1.437664] type=2000 audit(1222726720.437:1): initialized
[    1.448905] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    1.459521] VFS: Disk quotas dquot_6.5.1
[    1.463839] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    1.472140] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[    1.479048] msgmni has been set to 15840
[    1.483400] async_tx: api initialized (async)
[    1.487965] io scheduler noop registered
[    1.492088] io scheduler anticipatory registered
[    1.496908] io scheduler deadline registered
[    1.501569] io scheduler cfq registered (default)
[    1.506513] pci 0000:00:00.0: Enabling HT MSI Mapping
[    1.714230] pci 0000:00:05.0: Enabling HT MSI Mapping
[    1.719503] pci 0000:00:05.1: Enabling HT MSI Mapping
[    1.724776] pci 0000:00:05.2: Enabling HT MSI Mapping
[    1.730049] pci 0000:00:06.0: Enabling HT MSI Mapping
[    1.735322] pci 0000:00:08.0: Enabling HT MSI Mapping
[    1.740590] pci 0000:00:09.0: Enabling HT MSI Mapping
[    1.745858] pci 0000:00:0a.0: Enabling HT MSI Mapping
[    1.751129] pci 0000:00:0d.0: Enabling HT MSI Mapping
[    1.756398] pci 0000:00:0f.0: Enabling HT MSI Mapping
[    1.761885] pcieport-driver 0000:00:0a.0: found MSI capability
[    1.768257] pcieport-driver 0000:00:0d.0: found MSI capability
[    1.774568] pcieport-driver 0000:00:0f.0: found MSI capability
[    1.786055] input: Power Button (FF) as /class/input/input0
[    1.791842] ACPI: Power Button (FF) [PWRF]
[    1.796247] input: Power Button (CM) as /class/input/input1
[    1.802015] ACPI: Power Button (CM) [PWRB]
[    1.806763] ACPI: CPU0 (power states: C1[C1] C2[C2] C3[C3])
[    1.812828] processor ACPI0007:00: registered as cooling_device0
[    1.819040] ACPI: Processor [C000] (supports 2 throttling states)
[    1.825531] processor ACPI0007:01: registered as cooling_device1
[    1.831830] processor ACPI0007:02: registered as cooling_device2
[    1.838137] processor ACPI0007:03: registered as cooling_device3
[    1.911989] Non-volatile memory driver v1.2
[    1.916534] Linux agpgart interface v0.103
[    1.920923] Serial: 8250/16550 driver4 ports, IRQ sharing disabled
[    1.927505] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    1.934476] 00:0c: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[    1.940585] Floppy drive(s): fd0 is 1.44M
[    4.956690] floppy0: no floppy controllers found
[    4.963824] brd: module loaded
[    4.968171] loop: module loaded
[    4.971518] tun: Universal TUN/TAP device driver, 1.6
[    4.976759] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[    4.983339] console [netcon0] enabled
[    4.987198] netconsole: network logging started
[    4.991927] Uniform Multi-Platform E-IDE driver
[    4.996788] ide_generic: please use "probe_mask=0x3f" module parameter for probing all legacy ISA IDE ports
[    5.973537] hdb: CD-224E-N, ATAPI CD/DVD-ROM drive
[    6.563432] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[    6.568257] ide1 at 0x170-0x177,0x376 on irq 15
[    6.575067] hdb: ATAPI 24X CD-ROM drive, 256kB Cache
[    6.580450] Uniform CD-ROM driver Revision: 3.20
[    6.595380] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
[    6.603180] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
[    6.610473] megasas: 00.00.04.01 Thu July 24 11:41:51 PST 2008
[    6.616664] Driver 'sd' needs updating - please use bus_type methods
[    6.623282] Driver 'sr' needs updating - please use bus_type methods
[    6.630795] ACPI: PCI Interrupt Link [LTID] enabled at IRQ 23
[    6.636763] sata_nv 0000:00:05.0: PCI INT A -> Link[LTID] -> GSI 23 (level, low) -> IRQ 23
[    6.645374] sata_nv 0000:00:05.0: Using SWNCQ mode
[    6.650671] scsi0 : sata_nv
[    6.653847] scsi1 : sata_nv
[    6.657012] ata1: SATA max UDMA/133 cmd 0x24d0 ctl 0x24c4 bmdma 0x2490 irq 23
[    6.664351] ata2: SATA max UDMA/133 cmd 0x24c8 ctl 0x24c0 bmdma 0x2498 irq 23
[    7.000376] ata1: SATA link down (SStatus 0 SControl 300)
[    7.333704] ata2: SATA link down (SStatus 0 SControl 300)
[    7.339588] ACPI: PCI Interrupt Link [LSI1] enabled at IRQ 22
[    7.345557] sata_nv 0000:00:05.1: PCI INT B -> Link[LSI1] -> GSI 22 (level, low) -> IRQ 22
[    7.354168] sata_nv 0000:00:05.1: Using SWNCQ mode
[    7.359414] scsi2 : sata_nv
[    7.362523] scsi3 : sata_nv
[    7.365685] ata3: SATA max UDMA/133 cmd 0x24e8 ctl 0x24dc bmdma 0x24a0 irq 22
[    7.373020] ata4: SATA max UDMA/133 cmd 0x24e0 ctl 0x24d8 bmdma 0x24a8 irq 22
[    7.707000] ata3: SATA link down (SStatus 0 SControl 300)
[    8.040331] ata4: SATA link down (SStatus 0 SControl 300)
[    8.046191] ACPI: PCI Interrupt Link [LSI2] enabled at IRQ 21
[    8.052159] sata_nv 0000:00:05.2: PCI INT C -> Link[LSI2] -> GSI 21 (level, low) -> IRQ 21
[    8.060776] sata_nv 0000:00:05.2: Using SWNCQ mode
[    8.066013] scsi4 : sata_nv
[    8.069132] scsi5 : sata_nv
[    8.072289] ata5: SATA max UDMA/133 cmd 0x2800 ctl 0x24f4 bmdma 0x24b0 irq 21
[    8.079622] ata6: SATA max UDMA/133 cmd 0x24f8 ctl 0x24f0 bmdma 0x24b8 irq 21
[    8.413705] ata5: SATA link down (SStatus 0 SControl 300)
[    8.747037] ata6: SATA link down (SStatus 0 SControl 300)
[    8.752793] pata_amd 0000:00:04.0: BAR 0: can't reserve I/O region [0x1f0-0x1f7]
[    8.760551] pata_amd 0000:00:04.0: failed to request/iomap BARs for port 0 (errno=-16)
[    8.768821] pata_amd 0000:00:04.0: BAR 2: can't reserve I/O region [0x170-0x177]
[    8.776567] pata_amd 0000:00:04.0: failed to request/iomap BARs for port 1 (errno=-16)
[    8.784836] pata_amd 0000:00:04.0: no available native port
[    8.790918] Fusion MPT base driver 3.04.07
[    8.795214] Copyright (c) 1999-2008 LSI Corporation
[    8.800298] Fusion MPT SPI Host driver 3.04.07
[    8.805026] Fusion MPT SAS Host driver 3.04.07
[    8.810358] ieee1394: raw1394: /dev/raw1394 device initialized
[    8.816877] ACPI: PCI Interrupt Link [LUS2] enabled at IRQ 20
[    8.822837] ehci_hcd 0000:00:02.1: PCI INT B -> Link[LUS2] -> GSI 20 (level, low) -> IRQ 20
[    8.831550] ehci_hcd 0000:00:02.1: EHCI Host Controller
[    8.837215] ehci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 1
[    8.844998] ehci_hcd 0000:00:02.1: debug port 1
[    8.849747] ehci_hcd 0000:00:02.1: irq 20, io mem 0xd0041000
[    8.863348] ehci_hcd 0000:00:02.1: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
[    8.871323] usb usb1: configuration #1 chosen from 1 choice
[    8.877185] hub 1-0:1.0: USB hub found
[    8.881145] hub 1-0:1.0: 10 ports detected
[    8.987275] ACPI: PCI Interrupt Link [LUS0] enabled at IRQ 19
[    8.993239] ohci_hcd 0000:00:02.0: PCI INT A -> Link[LUS0] -> GSI 19 (level, low) -> IRQ 19
[    9.001945] ohci_hcd 0000:00:02.0: OHCI Host Controller
[    9.007531] ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2
[    9.015293] ohci_hcd 0000:00:02.0: irq 19, io mem 0xd0040000
[    9.075474] usb usb2: configuration #1 chosen from 1 choice
[    9.081323] hub 2-0:1.0: USB hub found
[    9.085280] hub 2-0:1.0: 10 ports detected
[    9.190297] USB Universal Host Controller Interface driver v3.0
[    9.196582] usbcore: registered new interface driver usblp
[    9.202265] Initializing USB Mass Storage driver...
[    9.207407] usbcore: registered new interface driver usb-storage
[    9.213610] USB Mass Storage support registered.
[    9.218573] PNP: PS/2 Controller [PNP0303:KBC0,PNP0f13:MSE0] at 0x60,0x64 irq 1,12
[    9.473636] serio: i8042 KBD port at 0x60,0x64 irq 1
[    9.479008] mice: PS/2 mouse device common for all mice
[    9.603338] md: linear personality registered for level -1
[    9.684166] md: raid0 personality registered for level 0
[    9.689677] md: raid1 personality registered for level 1
[    9.695181] md: raid10 personality registered for level 10
[    9.754596] raid6: int64x1   2559 MB/s
[    9.814595] raid6: int64x2   3271 MB/s
[    9.874588] raid6: int64x4   3205 MB/s
[    9.934602] raid6: int64x8   2272 MB/s
[    9.994596] raid6: sse2x1    3909 MB/s
[   10.054585] raid6: sse2x2    5182 MB/s
[   10.114594] raid6: sse2x4    5221 MB/s
[   10.118545] raid6: using algorithm sse2x4 (5221 MB/s)
[   10.123791] md: raid6 personality registered for level 6
[   10.129301] md: raid5 personality registered for level 5
[   10.134807] md: raid4 personality registered for level 4
[   10.140318] md: multipath personality registered for level -4
[   10.151370] device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel@redhat.com
[   10.160290] cpuidle: using governor ladder
[   10.164810] cpuidle: using governor menu
[   10.169007] usbcore: registered new interface driver usbhid
[   10.174781] usbhid: v2.6:USB HID core driver
[   10.179352] oprofile: using NMI interrupt.
[   10.184099] TCP cubic registered
[   10.187720] NET: Registered protocol family 10
[   10.193358] IPv6 over IPv4 tunneling driver
[   10.198469] NET: Registered protocol family 17
[   10.203498] RPC: Registered udp transport module.
[   10.208403] RPC: Registered tcp transport module.
[   10.213394] powernow-k8: Found 2 Dual-Core AMD Opteron(tm) Processor 2220 processors (4 cpu cores) (version 2.20.00)
[   10.220006] powernow-k8:    0 : fid 0x14 (2800 MHz), vid 0x8
[   10.230248] powernow-k8:    1 : fid 0x12 (2600 MHz), vid 0xa
[   10.236103] powernow-k8:    2 : fid 0x10 (2400 MHz), vid 0xc
[   10.241962] powernow-k8:    3 : fid 0xe (2200 MHz), vid 0xe
[   10.247733] powernow-k8:    4 : fid 0xc (2000 MHz), vid 0x10
[   10.253583] powernow-k8:    5 : fid 0xa (1800 MHz), vid 0x10
[   10.259441] powernow-k8:    6 : fid 0x2 (1000 MHz), vid 0x12
[   10.214172] powernow-k8:    0 : fid 0x14 (2800 MHz), vid 0x8
[   10.271343] powernow-k8:    1 : fid 0x12 (2600 MHz), vid 0xa
[   10.277212] powernow-k8:    2 : fid 0x10 (2400 MHz), vid 0xc
[   10.283072] powernow-k8:    3 : fid 0xe (2200 MHz), vid 0xe
[   10.288844] powernow-k8:    4 : fid 0xc (2000 MHz), vid 0x10
[   10.294703] powernow-k8:    5 : fid 0xa (1800 MHz), vid 0x10
[   10.300567] powernow-k8:    6 : fid 0x2 (1000 MHz), vid 0x12
[   10.307138] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[   10.313682] Freeing unused kernel memory: 676k freed
[   10.319217] Write protecting the kernel read-only data: 5672k
>> Loading modules 
   :: Scanning for scsi_wait_scan...scsi_wait_scan loaded.
   :: Scanning for aic7xxx...aic7xxx loaded.
   :: Scanning for arcmsr...arcmsr loaded.
   :: Scanning for atp870u...atp870u loaded.
   :: Scanning for 3w-xxxx...3w-xxxx loaded.
   :: Scanning for 3w-9xxx...3w-9xxx loaded.
   :: Scanning for aacraid...aacraid loaded.
   :: Scanning for dm-mirror...dm-log, dm-mirror loaded.
   :: Scanning for dm-snapshot...dm-snapshot loaded.
   :: Scanning for xfs...xfs loaded.
   :: Scanning for e1000...e1000 loaded.
   :: Scanning for tg3...libphy, tg3 loaded.
>> Activating mdev 
>> Determining root device... 
>> Mounting root... 
>> Booting (initramfs)..
INIT: version 2.86 booting

Gentoo Linux; http://www.gentoo.org/
 Copyright 1999-2007 Gentoo Foundation; Distributed under the GPLv2

Press I to enter interactive boot mode

 * Mounting proc at /proc ...                                             [ ok ]
 * Mounting sysfs at /sys ...                                             [ ok ]
 * Mounting /dev for udev ...                                             [ ok ]
 * Seeding /dev with needed nodes ...                                     [ ok ]
 * Starting udevd ...                                                     [ ok ]
 * Populating /dev with existing devices through uevents ...              [ ok ]
 * Letting udev process events ... *   udev loading module forcedeth
 *   udev loading module i2c_nforce2
 *   udev loading module k8temp
                                        [ ok ]
 * Finalizing udev configuration ...                                      [ ok ]
 * Mounting devpts at /dev/pts ...                                        [ ok ]
 * Checking root filesystem .../dev/sda3: clean, 395874/4006240 files, 1770311/8000000 blocks
                                           [ ok ]
 * Remounting root filesystem read/write ...                              [ ok ]
 * Using /etc/modules.autoload.d/kernel-2.6 as config:
 *   Loading module smsc37b787_wdt ...                                    [ ok ]
 * Autoloaded 1 module(s)
 * Setting up the Logical Volume Manager ...  /dev/cdrom: open failed: Read-only file system
  Attempt to close device '/dev/cdrom' which is not open.
  /dev/cdrom: open failed: Read-only file system
  Attempt to close device '/dev/cdrom' which is not open.
  No volume groups found
  No volume groups found
  No volume groups found
                              [ ok ]
 * Setting up dm-crypt mappings ...                                       [ ok ]
 * Checking all filesystems .../dev/sda1: clean, 68/24480 files, 71784/97636 blocks
ipp008.0 contains a file system with errors, check forced.
ipp008.0:                                                                     Inode 228356567 is in use, but has dtime set.  FIXED.
ipp008.0: Inode 228356567 has imagic flag set.  

ipp008.0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
        (i.e., without -a or -p options)

 * Fsck could not correct all errors, manual repair needed
                                                                          [ !! ]
Give root password for maintenance
(or type Control-D to continue): 
(none) ~ # fsck -ay -C -v /dev/sda4
fsck 1.40.4 (31-Dec-2007)
e2fsck: Only one of the options -p/-a, -n or -y may be specified.
(none) ~ # fsck -y -C -v /dev/sda4
fsck 1.40.4 (31-Dec-2007)
e2fsck 1.40.4 (31-Dec-2007)
ipp008.0 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
[12491.499178] Kernel panic - not syncing: softlockup: hung tasks  -  6.3%   
[12491.499178] ------------[ cut here ]------------
[12491.499178] WARNING: at kernel/mutex.c:351 mutex_trylock+0x52/0x111()
[12491.499178] Modules linked in: smsc37b787_wdt k8temp i2c_nforce2 i2c_core forcedeth tg3 libphy e1000 xfs dm_snapshot dm_mirror dm_log aacraid 3w_9xxx 3w_xxxx atp870u arcmsr aic7xxx scsi_wait_scan
[12491.499178] Pid: 303, comm: kswapd0 Not tainted 2.6.27-rc5-22033-gd26acd9-dirty #2
[12491.499178] 
[12491.499178] Call Trace:
[12491.499178]  <IRQ>  [<ffffffff80235361>] warn_on_slowpath+0x51/0x77
[12491.499178]  [<ffffffff80235816>] release_console_sem+0x3e/0x1a1
[12491.499178]  [<ffffffff805c63de>] mutex_trylock+0x52/0x111
[12491.499178]  [<ffffffff8025f3cf>] crash_kexec+0x17/0xef
[12491.499178]  [<ffffffff803bbab9>] bust_spinlocks+0x15/0x30
[12491.499178]  [<ffffffff80235260>] panic+0x8f/0x13f
[12491.499178]  [<ffffffff80235816>] release_console_sem+0x3e/0x1a1
[12491.499178]  [<ffffffff80235816>] release_console_sem+0x3e/0x1a1
[12491.499178]  [<ffffffff80272551>] softlockup_tick+0x19e/0x1ab
[12491.499178]  [<ffffffff8023de88>] update_process_times+0x26/0x4b
[12491.499178]  [<ffffffff80250535>] tick_sched_timer+0x76/0xa7
[12491.499178]  [<ffffffff802504bf>] tick_sched_timer+0x0/0xa7
[12491.499178]  [<ffffffff80249dba>] __run_hrtimer+0x55/0x92
[12491.499178]  [<ffffffff8024a7e7>] hrtimer_interrupt+0xe3/0x15d
[12491.499178]  [<ffffffff8021c23c>] smp_apic_timer_interrupt+0x89/0xa9
[12491.499178]  [<ffffffff8020cbb6>] apic_timer_interrupt+0x66/0x70
[12491.499178]  <EOI>  [<ffffffff80266e5c>] res_counter_uncharge+0x1d/0x3f
[12491.499178]  [<ffffffff80252f1b>] lock_acquire+0x67/0x6d
[12491.499178]  [<ffffffff802c4dc3>] try_to_free_buffers+0x51/0xa0
[12491.499178]  [<ffffffff805c7c79>] _spin_lock+0x29/0x34
[12491.499178]  [<ffffffff802c4dc3>] try_to_free_buffers+0x51/0xa0
[12491.499178]  [<ffffffff802c4dc3>] try_to_free_buffers+0x51/0xa0
[12491.499178]  [<ffffffff80281196>] shrink_page_list+0x440/0x57f
[12491.499178]  [<ffffffff80280ae2>] isolate_lru_pages+0x62/0x1f3
[12491.499178]  [<ffffffff80281424>] shrink_inactive_list+0x133/0x3a0
[12491.499178]  [<ffffffff80281798>] shrink_zone+0x107/0x12a
[12491.499178]  [<ffffffff802824e3>] kswapd+0x303/0x4ac
[12491.499178]  [<ffffffff80280c73>] isolate_pages_global+0x0/0x2f
[12491.499178]  [<ffffffff805c5f64>] thread_return+0x3e/0xa8
[12491.499178]  [<ffffffff8024791a>] autoremove_wake_function+0x0/0x2e
[12491.499178]  [<ffffffff802821e0>] kswapd+0x0/0x4ac
[12491.499178]  [<ffffffff80247805>] kthread+0x47/0x76
[12491.499178]  [<ffffffff80230223>] schedule_tail+0x27/0x5f
[12491.499178]  [<ffffffff8020ce09>] child_rip+0xa/0x11
[12491.499178]  [<ffffffff8024767c>] kthreadd+0x167/0x18c
[12491.499178]  [<ffffffff802477be>] kthread+0x0/0x76
[12491.499178]  [<ffffffff8020cdff>] child_rip+0x0/0x11
[12491.499178] 
[12491.499178] ---[ end trace 6a9aab8b65c1a995 ]---
[12491.499178] ------------[ cut here ]------------
[12491.499178] WARNING: at kernel/smp.c:332 smp_call_function_mask+0x37/0x1d6()
[12491.499178] Modules linked in: smsc37b787_wdt k8temp i2c_nforce2 i2c_core forcedeth tg3 libphy e1000 xfs dm_snapshot dm_mirror dm_log aacraid 3w_9xxx 3w_xxxx atp870u arcmsr aic7xxx scsi_wait_scan
[12491.499178] Pid: 303, comm: kswapd0 Tainted: G        W 2.6.27-rc5-22033-gd26acd9-dirty #2
[12491.499178] 
[12491.499178] Call Trace:
[12491.499178]  <IRQ>  [<ffffffff80235361>] warn_on_slowpath+0x51/0x77
[12491.499178]  [<ffffffff80234f85>] print_oops_end_marker+0x9/0x20
[12491.499178]  [<ffffffff80235366>] warn_on_slowpath+0x56/0x77
[12491.499178]  [<ffffffff80256c53>] smp_call_function_mask+0x37/0x1d6
[12491.499178]  [<ffffffff8025f3cf>] crash_kexec+0x17/0xef
[12491.499178]  [<ffffffff8025f49e>] crash_kexec+0xe6/0xef
[12491.499178]  [<ffffffff8021b596>] native_smp_send_stop+0x1a/0x26
[12491.499178]  [<ffffffff80235266>] panic+0x95/0x13f
[12491.499178]  [<ffffffff80235816>] release_console_sem+0x3e/0x1a1
[12491.499178]  [<ffffffff80235816>] release_console_sem+0x3e/0x1a1
[12491.499178]  [<ffffffff80272551>] softlockup_tick+0x19e/0x1ab
[12491.499178]  [<ffffffff8023de88>] update_process_times+0x26/0x4b
[12491.499178]  [<ffffffff80250535>] tick_sched_timer+0x76/0xa7
[12491.499178]  [<ffffffff802504bf>] tick_sched_timer+0x0/0xa7
[12491.499178]  [<ffffffff80249dba>] __run_hrtimer+0x55/0x92
[12491.499178]  [<ffffffff8024a7e7>] hrtimer_interrupt+0xe3/0x15d
[12491.499178]  [<ffffffff8021c23c>] smp_apic_timer_interrupt+0x89/0xa9
[12491.499178]  [<ffffffff8020cbb6>] apic_timer_interrupt+0x66/0x70
[12491.499178]  <EOI>  [<ffffffff80266e5c>] res_counter_uncharge+0x1d/0x3f
[12491.499178]  [<ffffffff80252f1b>] lock_acquire+0x67/0x6d
[12491.499178]  [<ffffffff802c4dc3>] try_to_free_buffers+0x51/0xa0
[12491.499178]  [<ffffffff805c7c79>] _spin_lock+0x29/0x34
[12491.499178]  [<ffffffff802c4dc3>] try_to_free_buffers+0x51/0xa0
[12491.499178]  [<ffffffff802c4dc3>] try_to_free_buffers+0x51/0xa0
[12491.499178]  [<ffffffff80281196>] shrink_page_list+0x440/0x57f
[12491.499178]  [<ffffffff80280ae2>] isolate_lru_pages+0x62/0x1f3
[12491.499178]  [<ffffffff80281424>] shrink_inactive_list+0x133/0x3a0
[12491.499178]  [<ffffffff80281798>] shrink_zone+0x107/0x12a
[12491.499178]  [<ffffffff802824e3>] kswapd+0x303/0x4ac
[12491.499178]  [<ffffffff80280c73>] isolate_pages_global+0x0/0x2f
[12491.499178]  [<ffffffff805c5f64>] thread_return+0x3e/0xa8
[12491.499178]  [<ffffffff8024791a>] autoremove_wake_function+0x0/0x2e
[12491.499178]  [<ffffffff802821e0>] kswapd+0x0/0x4ac
[12491.499178]  [<ffffffff80247805>] kthread+0x47/0x76
[12491.499178]  [<ffffffff80230223>] schedule_tail+0x27/0x5f
[12491.499178]  [<ffffffff8020ce09>] child_rip+0xa/0x11
[12491.499178]  [<ffffffff8024767c>] kthreadd+0x167/0x18c
[12491.499178]  [<ffffffff802477be>] kthread+0x0/0x76
[12491.499178]  [<ffffffff8020cdff>] child_rip+0x0/0x11
[12491.499178] 
[12491.499178] ---[ end trace 6a9aab8b65c1a995 ]---
Comment 39 Thomas Gleixner 2008-09-30 02:03:13 UTC
> [    0.000000] Linux version 2.6.27-rc5-22033-gd26acd9-dirty

Can you please test 2.6.27-rc8 ?

We need a common base to look at. Your network investigations are a
separate issue. 2.6.27-rc5 is known to be problematic with C1E
machines and we really need to know whether we catched all corner
cases in mainline.

> [12491.499178] Kernel panic - not syncing: softlockup: hung tasks  -  6.3%   

Can you please disable the panic on softlockup ?

CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0

Crashing the machine does not help as we can not gather more
information from the machine.

Can you trigger it with fsck reproducible ?

Thanks,

	tglx
Comment 40 Cyrill Gorcunov 2008-09-30 10:04:32 UTC
> How do you find out what mode the kernel has decided to use on it's own?
> Will nomwait screw with dynticks?

Boot log with "debug" option will print (or not in case of default idle) message. About dyntricks - not sure.
Comment 41 Anonymous Emailer 2008-09-30 13:10:43 UTC
Reply-To: josh@hoblitt.com

> ------- Comment #39 from tglx@linutronix.de  2008-09-30 02:03 -------
> > [    0.000000] Linux version 2.6.27-rc5-22033-gd26acd9-dirty
> 
> Can you please test 2.6.27-rc8 ?

Yes.

> We need a common base to look at. Your network investigations are a
> separate issue. 2.6.27-rc5 is known to be problematic with C1E
> machines and we really need to know whether we catched all corner
> cases in mainline.

I believe the fixes I need have been merged in from netdev-2.6.  Without
them the machine will experience other random failures that I believe
had been obscuring this issue.

> > [12491.499178] Kernel panic - not syncing: softlockup: hung tasks  -  6.3%  
> 
> Can you please disable the panic on softlockup ?
> 
> CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0

Can do.

> Crashing the machine does not help as we can not gather more
> information from the machine.
> 
> Can you trigger it with fsck reproducible ?

Maybe?  I got the panic on one of the machines again last night.  It
seems to come in waves on various different nodes and then go away for
weeks/months.

-J

--
Comment 42 Joshua Hoblitt 2008-09-30 14:22:44 UTC
> > ------- Comment #39 from tglx@linutronix.de  2008-09-30 02:03 -------
> > > [    0.000000] Linux version 2.6.27-rc5-22033-gd26acd9-dirty
> > 
> > Can you please test 2.6.27-rc8 ?
> 
> Yes.

I'm testing -rc8 w/o panic on softlockup on another machine right now but one
the machines running -rc5 from netdev-2.6 just had another oops that looks
different from the others.

[13444.209171] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
[13444.212042] IP: [<ffffffff80218e16>] powernowk8_target+0x61d/0x857
[13444.212042] PGD 2179c3067 PUD 22b438067 PMD 0 
[13444.212042] Oops: 0002 [1] SMP 
[13444.212042] CPU 0 
[13444.212042] Modules linked in: w83627hf hwmon_vid autofs4 smsc37b787_wdt i2c_nforce2 forcedeth i2c_core k8temp tg3 libphy e1000 xfs dm_snapshot dm_mirror dm_log aacraid 3w_9xxx 3w_xxxx atp870u arcmsr aic7xxx scsi_wait_scan
[13444.212042] Pid: 251, comm: kondemand/0 Not tainted 2.6.27-rc5-22033-gd26acd9-dirty #2
[13444.212042] RIP: 0010:[<ffffffff80218e16>]  [<ffffffff80218e16>] powernowk8_target+0x61d/0x857
[13444.212042] RSP: 0018:ffff88012fbebdc0  EFLAGS: 00010246
[13444.212042] RAX: 0000000000000000 RBX: ffff88012e502600 RCX: 0000000000000002
[13444.212042] RDX: 0000000000000008 RSI: 0000000000000002 RDI: ffff88012e502600
[13444.212042] RBP: 000000000000000c R08: 0000000000000008 R09: 0000000000000190
[13444.212042] R10: ffff88012fbebd50 R11: ffffffff803bac5e R12: 0000000000000002
[13444.212042] R13: 0000000000000012 R14: 0000000000000008 R15: 0000000000000000
[13444.212042] FS:  0000000042427950(0000) GS:ffffffff80858e00(0000) knlGS:0000000000000000
[13444.212042] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[13444.212042] CR2: 0000000000000002 CR3: 00000002179c2000 CR4: 00000000000006e0
[13444.212042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13444.212042] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[13444.212042] Process kondemand/0 (pid: 251, threadinfo ffff88012fbea000, task ffff88012fbe3c40)
[13444.212042] Stack:  0000000000000002 ffff880000000000 ffff88012e459600 0000001400000282
[13444.212042]  002ab98000000001 ffff8800000f4240 0000000000000001 000000062e459600
[13444.212042]  0000000000000000 0000000000000046 ffff88012e459600 000000000000007e
[13444.212042] Call Trace:
[13444.212042]  [<ffffffff8050d244>] ? do_dbs_timer+0x1e0/0x24f
[13444.212042]  [<ffffffff8050d064>] ? do_dbs_timer+0x0/0x24f
[13444.212042]  [<ffffffff8050d064>] ? do_dbs_timer+0x0/0x24f
[13444.212042]  [<ffffffff80244394>] ? run_workqueue+0xed/0x1ed
[13444.212042]  [<ffffffff8024433e>] ? run_workqueue+0x97/0x1ed
[13444.212042]  [<ffffffff80244ef7>] ? worker_thread+0xd8/0xe3
[13444.212042]  [<ffffffff8024791a>] ? autoremove_wake_function+0x0/0x2e
[13444.212042]  [<ffffffff80244e1f>] ? worker_thread+0x0/0xe3
[13444.212042]  [<ffffffff80247805>] ? kthread+0x47/0x76
[13444.212042]  [<ffffffff80230223>] ? schedule_tail+0x27/0x5f
[13444.212042]  [<ffffffff8020ce09>] ? child_rip+0xa/0x11
[13444.212042]  [<ffffffff8024767c>] ? kthreadd+0x167/0x18c
[13444.212042]  [<ffffffff802477be>] ? kthread+0x0/0x76
[13444.212042]  [<ffffffff8020cdff>] ? child_rip+0x0/0x11
[13444.212042] 
[13444.212042] 
[13444.212042] Code: c7 f7 31 6f 80 e8 b1 d0 01 00 e9 4c 01 00 00 8b 53 28 41 39 d6 74 0e 44 89 f6 48 c7 c7 42 32 6f 80 31 c0 eb e0 45 89 f0 44 89 e1 <48> c7 c2 7a 32 6f 80 48 c7 c6 f7 2b 6f 80 bf 02 00 00 00 31 c0 
[13444.212042] RIP  [<ffffffff80218e16>] powernowk8_target+0x61d/0x857
[13444.212042]  RSP <ffff88012fbebdc0>
[13444.212042] CR2: 0000000000000002
[13444.530077] ---[ end trace 7cca0ea284e8104a ]---
Comment 43 Cyrill Gorcunov 2008-10-01 07:29:23 UTC
Lets wait for your -rc8 experiments results.
Comment 44 Cyrill Gorcunov 2008-10-02 11:26:35 UTC
Hi Joshua, any news from -rc8?
Comment 45 Joshua Hoblitt 2008-10-03 13:11:43 UTC
-rc8 proved itself stable for a few days on two test nodes and it's being rolled out onto 32 nodes production nodes as I type this.  The bug strikes somewhat randomly so it'll take time for it to pop up in -rc8 if it's still there.  Heavy jobs will be running over the weekend and should trip this issues if it's still in there...
Comment 46 Cyrill Gorcunov 2008-10-04 07:37:10 UTC
Thanks a lot for testing, Joshua! Report us the results in any case please.
Comment 47 Cyrill Gorcunov 2008-10-08 09:56:13 UTC
Hi Joshua, did -rc8 prove to be more/less stable for you?
Comment 48 Joshua Hoblitt 2008-10-08 12:43:29 UTC
> ------- Comment #47 from gorcunov@gmail.com  2008-10-08 09:56 -------
> Hi Joshua, did -rc8 prove to be more/less stable for you?

It's not clear yet.  We've had two random deadlocks on different
machines running -rc8.  Unforunately, neither machine had a kernel trace
on the console.

-J

--
Comment 49 Joshua Hoblitt 2008-10-08 18:52:50 UTC
We just had the 3 or 4th -rc8 crash but at least this one had something on the console.

<Oct/03 08:45 am>This is ipp016.unknown_domain (Linux x86_64 2.6.27-rc8-00016-gd3a47e8) 10:04:01


<Oct/08 02:25 pm>ipp016 login: [452552.081281] do_IRQ: 2.179 No irq handler for vector
<Oct/08 02:29 pm>[452761.138609] BUG: spinlock lockup on CPU#2, perl/5830, ffff88022e412e58
<Oct/08 02:29 pm>[452761.646588] BUG: spinlock lockup on CPU#1, kswapd0/303, ffff88022e412e58
<Oct/08 02:29 pm>[452761.930596] BUG: spinlock lockup on CPU#0, sshd/1833, ffff88022e412e58
Comment 50 Cyrill Gorcunov 2008-10-09 00:00:57 UTC
Thanks Joshua for report -- will check today evening.
Comment 51 Thomas Gleixner 2008-10-09 02:21:39 UTC
> <Oct/08 02:25 pm>ipp016 login: [452552.081281] do_IRQ: 2.179 No irq handler
> for
> vector

Hmm, this one is odd and might be the root of all evil. Have you seen that before ?

Thanks,

	tglx
Comment 52 Cyrill Gorcunov 2008-10-09 08:06:33 UTC
It seems some PCI-MSI doesn't have a vector assigned. Hmm.. What /proc/interrupt says on this machine?
Comment 53 Joshua Hoblitt 2008-10-09 12:58:06 UTC
On Thu, Oct 09, 2008 at 08:06:34AM -0700, bugme-daemon@bugzilla.kernel.org wrote:
> ------- Comment #52 from gorcunov@gmail.com  2008-10-09 08:06 -------
> It seems some PCI-MSI doesn't have a vector assigned. Hmm.. What
> /proc/interrupt says on this machine?

The sata_nv driver isn't it use.  The only PCI[e] devices that see real use are
forcedeth.c/eth0 and arcmsr.  forcedeth.c has MSI turned off as there has been
suspecions about MSI and this driver in the past. The exact options are
"options forcedeth msi=0 msix=0 max_interrupt_work=30".


PU0       CPU1       CPU2       CPU3       
  0:         43          0          2        373   IO-APIC-edge      timer
  1:          0          0          0          2   IO-APIC-edge      i8042
  4:          0          0          2        722   IO-APIC-edge      serial
  7:          1          0          0          0   IO-APIC-edge    
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 14:          0          1          0        198   IO-APIC-edge      ide0
 15:          0          0          0          0   IO-APIC-edge      ide1
 17:        347     852881     114490  163953647   IO-APIC-fasteoi   eth0
 18:         76      35497      59522    1300569   IO-APIC-fasteoi   arcmsr
 19:          0          0          0          0   IO-APIC-fasteoi   ohci_hcd:usb2
 20:          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1
 21:          0          0          0          0   IO-APIC-fasteoi   sata_nv
 22:          0          0          0          0   IO-APIC-fasteoi   sata_nv
 23:          0          0          0          0   IO-APIC-fasteoi   sata_nv
NMI:          0          0          0          0   Non-maskable interrupts
LOC:    6977995    3235603   15722263    4343273   Local timer interrupts
RES:     201328     208390     181427     128665   Rescheduling interrupts
CAL:        717        945       2009      11092   function call interrupts
TLB:       9367      14367       2038       1608   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
SPU:          0          0          0          0   Spurious interrupts
ERR:          1
Comment 54 Anonymous Emailer 2008-10-09 13:00:45 UTC
Reply-To: josh@hoblitt.com

On Thu, Oct 09, 2008 at 02:21:39AM -0700, bugme-daemon@bugzilla.kernel.org wrote:
> ------- Comment #51 from tglx@linutronix.de  2008-10-09 02:21 -------
> > <Oct/08 02:25 pm>ipp016 login: [452552.081281] do_IRQ: 2.179 No irq handler
> for
> > vector
> 
> Hmm, this one is odd and might be the root of all evil. Have you seen that
> before ?

Yes, we've seen it before.  I can find several examples of it being
printed onto the console before a softlockup backtrace.  Eg.

--
[11532.103605] do_IRQ: 0.175 No irq handler for vector
<Sep/11 12:13 pm>[11532.103613] do_IRQ: 2.175 No irq handler for vector
<Sep/11 12:13 pm>[11532.103617] do_IRQ: 1.175 No irq handler for vector
<Sep/11 12:14 pm>[11560.779989] do_IRQ: 0.179 No irq handler for vector
--

-J

--
Comment 55 Cyrill Gorcunov 2008-10-13 07:25:54 UTC
I think it's time to ask forcedeth people about it. No sure who exactly to be asked.
Comment 56 Joshua Hoblitt 2008-10-13 18:22:50 UTC
I've requested that one of the forcedeth devs consider that possibility in bug #9047.
Comment 57 Joshua Hoblitt 2008-10-15 12:19:32 UTC
We hit the do_IRQ bug again.  Any idea on how to debug this?

<Oct/14 07:14 pm>ipp018 login: [969147.235296] do_IRQ: 1.179 No irq
handler for vector
<Oct/14 07:14 pm>[969148.722531] do_IRQ: 1.187 No irq handler for vector
<Oct/14 07:14 pm>[969158.179808] do_IRQ: 0.187 No irq handler for vector
Comment 58 Cyrill Gorcunov 2008-10-16 07:35:08 UTC
Joshua I think I could take a look on holidays only (too busy now). Anyway -- it would be great if you could turn the modules you mentoined (especially forcedeth) off and see if do_IRQ fails since I have a suspicious it's NIC problem.
Comment 59 Joshua Hoblitt 2008-10-16 13:08:44 UTC
We had another 4 crashes over night.  2 with nothing on the console, 1 just saying that the kernel was panicing, and this:

<Oct/15 05:38 pm>ipp014 login: [823902.827952] BUG: spinlock wrong owner on CPU#2, ppImage/27000
<Oct/15 05:38 pm>[823902.828447]  lock: ffffffff80f425b8, .magic: dead4ead, .owner: ppImage/27000, .owner_cpu: 2
<Oct/15 05:38 pm>[823902.828447] Pid: 27000, comm: ppImage Not tainted 2.6.27-rc8-00016-gd3a47e8 #1
<Oct/15 05:38 pm>[823902.828447] 
<Oct/15 05:38 pm>[823902.828447] Call Trace:
<Oct/15 05:38 pm>[823902.828447]  [<ffffffff803bdf0e>] _raw_spin_unlock+0x51/0x80
<Oct/15 05:38 pm>[823902.828447]  [<ffffffff805c82f8>] _spin_unlock_irqrestore+0x27/0x2e
<Oct/15 05:38 pm>[823902.828447]  [<ffffffff802a27af>] __mem_cgroup_uncharge_common+0xcb/0x107
<Oct/15 05:38 pm>[823902.828447]  [<ffffffff8028d642>] page_remove_rmap+0x108/0x122
<Oct/15 05:38 pm>[823902.828447]  [<ffffffff80285e23>] unmap_vmas+0x47f/0x7ec
<Oct/15 05:38 pm>[823902.828447]  [<ffffffff80289afb>] unmap_region+0xb3/0x126
<Oct/15 05:38 pm>[823902.828447]  [<ffffffff8028a81b>] do_munmap+0x1f3/0x270
<Oct/15 05:38 pm>[823902.828447]  [<ffffffff805c7ae6>] __down_write_nested+0x34/0x9e

<Oct/15 05:38 pm>[823902.828447]  [<ffffffff8028b715>] sys_munmap+0x40/0x5a
<Oct/15 05:38 pm>[823902.828447]  [<ffffffff8020be2b>] system_call_fastpath+0x16/0x1b
<Oct/15 05:38 pm>[823902.828447] 
<Oct/15 05:47 pm>[824473.019627] BUG: unable to handle kernel NULL pointer dereference at 0000000000000158
<Oct/15 05:47 pm>[824473.021724] IP: [<ffffffff8022b04d>] pick_next_task_fair+0x7d/0xa9
<Oct/15 05:47 pm>[824473.021724] PGD 21fc8b067 PUD 22e41c067 PMD 0 
<Oct/15 05:47 pm>[824473.021724] Oops: 0000 [1] SMP 
<Oct/15 05:47 pm>[824473.021724] CPU 2 
<Oct/15 05:47 pm>[824473.021724] Modules linked in: w83627hf hwmon_vid autofs4 smsc37b787_wdt k8temp forcedeth i2c_nforce2 i2c_core tg3 libphy e1000 xfs dm_snapshot dm_mirror dm_log aacraid 3w_9xxx 3w_xxxx atp870u arcmsr aic7xxx scsi_wait_scan
<Oct/15 05:47 pm>[824473.021724] Pid: 27703, comm: ppImage Not tainted 2.6.27-rc8-00016-gd3a47e8 #1
<Oct/15 05:47 pm>[824473.021724] RIP: 0010:[<ffffffff8022b04d>]  [<ffffffff8022b04d>] pick_next_task_fair+0x7d/0xa9
<Oct/15 05:47 pm>[824473.021724] RSP: 0018:ffff880223d69d18  EFLAGS: 00010046
<Oct/15 05:47 pm>[824473.021724] RAX: 000001e2b8508e7b RBX: ffff88021fc79e40 RCX: 0000000000000000
<Oct/15 05:47 pm>[824473.021724] RDX: ffff88013624a100 RSI: ffff88013624a1c8 RDI: ffff880223c09010
<Oct/15 05:47 pm>[824473.021724] RBP: ffff880223d69d48 R08: 0000000000000000 R09: ffff88013624a1c8
<Oct/15 05:47 pm>[824473.021724] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
<Oct/15 05:47 pm>[824473.021724] R13: 0002edda956529cd R14: ffff88013624a100 R15: 0000000000000000
<Oct/15 05:47 pm>[824473.021724] FS:  00007f83fbab56f0(0000) GS:ffff88022fa0d300(0000) knlGS:0000000000000000
<Oct/15 05:47 pm>[824473.021724] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<Oct/15 05:47 pm>[824473.021724] CR2: 0000000000000158 CR3: 000000021e4ff000 CR4: 00000000000006e0
<Oct/15 05:47 pm>[824473.021724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<Oct/15 05:47 pm>[824473.021724] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7<Oct/15 05:47 pm>0

<Oct/15 05:47 pm>[824473.021724] Process ppImage (pid: 27703, threadinfo ffff880223d68000, task ffff8800a6bb8f10)
<Oct/15 05:47 pm>[824473.021724] Stack:  ffff88013624a100 ffff88013624a100 ffffffff805dc840 ffff880223d69d88
<Oct/15 05:47 pm>[824473.021724]  0000000000000000 000000010ebcc2ea ffff880223d69dd8 ffffffff805c5f2a
<Oct/15 05:47 pm>[824473.021724]  0000000000000000 ffffffff805567ed ffff8800a6bb8f10 0000000000000000
<Oct/15 05:47 pm>[824473.021724] Call Trace:
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff805c5f2a>] schedule+0x434/0x7e1
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff805567ed>] ? tcp_connect+0x319/0x322
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff8055974b>] ? tcp_v4_connect+0x3c1/0x422
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff805c64ab>] schedule_timeout+0x1e/0xad
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff805201cd>] ? release_sock+0x2f/0xd5
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff80564a0f>] ? inet_stream_connect+0x137/0x241
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff80247b2a>] ? autoremove_wake_function+0x0/0x2e
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff8051eb3b>] ? sys_connect+0x6c/0x9c
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff802a6290>] ? fget+0xb6/0xbe
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff805c7c2a>] ? lockdep_sys_exit_thunk+0x35/0x67
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff8039c9ac>] ? cap_file_fcntl+0x0/0x3
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff802a61da>] ? fget+0x0/0xbe
<Oct/15 05:47 pm>[824473.021724]  [<ffffffff8020be2b>] ? system_call_fastpath+0x16/0x1b
<Oct/15 05:47 pm>[824473.021724] 
<Oct/15 05:47 pm>[824473.021724] 
<Oct/15 05:47 pm>[824473.021724] Code: ec e8 52 f5 ff ff 49 39 c4 76 11 49 8b 86 30 08 00 00 4d 8d 67 f0 48 89 43 28 eb 04 4c 8b 63 60 4c 89 e6 48 89 df e8 6a fa ff ff <49> 8b 9c 24 58 01 00 00 48 85 db 75 9f 48 8b 7d d0 49 83 ec 38 
<Oct/15 05:47 pm>[824473.021724] RIP  [<ffffffff8022b04d>] pick_next_task_fair+0x7d/0xa9

<Oct/15 05:47 pm>[824473.021724]  RSP <ffff880223d69d18>
<Oct/15 05:47 pm>[824473.021724] CR2: 0000000000000158
<Oct/15 05:47 pm>[824473.021724] ---[ end trace 0dbe4de46ac70cd0 ]---
<Oct/15 05:50 pm>[824473.077977] BUG: spinlock lockup on CPU#1, swapper/0, ffff88013624a100
<Oct/15 05:50 pm>[824473.077977] Pid: 0, comm: swapper Tainted: G      D   2.6.27-rc8-00016-gd3a47e8 #1
<Oct/15 05:50 pm>[824473.077977] 
                 <Oct/15 05:50 pm>[824473.077977] Call Trace:
<Oct/15 05:50 pm>[824473.077977]  <IRQ>  [<ffffffff803be0c4>] _raw_spin_lock+0xf9/0x120
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff805c8248>] _spin_lock_irqsave+0x37/0x3f
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff80231d47>] tg_shares_up+0xcf/0x195
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff80231d47>] tg_shares_up+0xcf/0x195
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff80231c78>] tg_shares_up+0x0/0x195
<Oct/15 05:50 pm>[824473.124229] BUG: spinlock lockup on CPU#0, ppImage/27409, ffff88013624a100
<Oct/15 05:50 pm>[824473.124229] Pid: 27409, comm: ppImage Tainted: G      D   2.6.27-rc8-00016-gd3a47e8 #1
<Oct/15 05:50 pm>[824473.124229] 
<Oct/15 05:50 pm>[824473.124229] Call Trace:
<Oct/15 05:50 pm>[824473.124229]  <IRQ>  [<ffffffff803be0c4>] _raw_spin_lock+0xf9/0x120
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff805c8248>] _spin_lock_irqsave+0x37/0x3f
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80231d47>] tg_shares_up+0xcf/0x195
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80231d47>] tg_shares_up+0xcf/0x195
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80231c78>] tg_shares_up+0x0/0x195
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff802296d8>] tg_nop+0x0/0x6

<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8022adb5>] walk_tg_tree+0x90/0xbc
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8022ad25>] walk_tg_tree+0x0/0xbc
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8022e3c2>] try_to_wake_up+0xb0/0x238
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80229d46>] __wake_up_common+0x41/0x74
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8022a8d2>] __wake_up_sync+0x3a/0x56
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80521e4a>] sock_def_readable+0x3d/0x6a
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80553256>] tcp_rcv_established+0x60f/0x8e5
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff805586a3>] tcp_v4_do_rcv+0x2c/0x1b7
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80559e93>] tcp_v4_rcv+0x263/0x654
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8055a231>] tcp_v4_rcv+0x601/0x654
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80540d01>] ip_local_deliver+0xcb/0x164
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80540c91>] ip_local_deliver+0x5b/0x164
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80541294>] ip_rcv+0x4fa/0x552
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8052904b>] netif_receive_skb+0x24f/0x298
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80528f03>] netif_receive_skb+0x107/0x298
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8052b795>] process_backlog+0x71/0xbe
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8052b07e>] net_rx_action+0xc5/0x1fb
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8052b010>] net_rx_action+0x57/0x1fb
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80239d7a>] __do_softirq+0x5e/0xcd
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8021d70b>] ack_apic_level+0x12/0xd3
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8020d17c>] call_softirq+0x1c/0x28
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8020e82a>] do_softirq+0x2c/0x68

<Oct/15 05:50 pm>[824473.124229]  [<ffffffff80239cd6>] irq_exit+0x3f/0x85
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8020e9ca>] do_IRQ+0x13c/0x15e
<Oct/15 05:50 pm>[824473.124229]  [<ffffffff8020c3f1>] ret_from_intr+0x0/0xa
<Oct/15 05:50 pm>[824473.124229]  <EOI> 
<Oct/15 05:50 pm>[824473.329415] BUG: spinlock lockup on CPU#2, ppImage/27703, ffff88013624a100
<Oct/15 05:50 pm>[824473.329415] Pid: 27703, comm: ppImage Tainted: G      D   2<Oct/15 05:50 pm>d3a47e8 #1
                 <Oct/15 05:50 pm>[824473.329415] 
<Oct/15 05:50 pm>[824473.329415] Call Trace:
<Oct/15 05:50 pm>[824473.329415]  <IRQ>  [<ffffffff803be0c4>] _raw_spin_lock+0xf9/0x120
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff80232437>] scheduler_tick+0x47/0x1b8
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff8023e0b1>] update_process_times+0x3f/0x4b
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff802506fa>] tick_sched_timer+0x76/0xa7
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff80250684>] tick_sched_timer+0x0/0xa7
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff80249fce>] __run_hrtimer+0x5a/0x97
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff8024a9fb>] hrtimer_interrupt+0xe3/0x15d
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff8021c29c>] smp_apic_timer_interrupt+0x89/0xa9
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff8020cbc6>] apic_timer_interrupt+0x66/0x70
<Oct/15 05:50 pm>[824473.329415]  <EOI>  [<ffffffff80241808>] exit_signals+0xf4/0x10c
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff805c81ee>] _spin_unlock_irq+0x21/0x22
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff802417c6>] exit_signals+0xb2/0x10c
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff80238123>] do_exit+0xd6/0x778
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff805c8851>] oops_begin+0x0/0x8c

<Oct/15 05:50 pm>[824473.329415]  [<ffffffff805ca732>] do_page_fault+0x73c/0x7f5
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff803b457d>] __next_cpu+0x19/0x26
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff805c844d>] error_exit+0x0/0x9a
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff8022b04d>] pick_next_task_fair+0x7d/0xa9
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff8022b04d>] pick_next_task_fair+0x7d/0xa9
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff805c5f2a>] schedule+0x434/0x7e1
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff805567ed>] tcp_connect+0x319/0x322
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff8055974b>] tcp_v4_connect+0x3c1/0x422
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff805c64ab>] schedule_timeout+0x1e/0xad
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff805201cd>] release_sock+0x2f/0xd5
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff80564a0f>] inet_stream_connect+0x137/0x241
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff80247b2a>] autoremove_wake_function+0x0/0x2e
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff8051eb3b>] sys_connect+0x6c/0x9c
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff802a6290>] fget+0xb6/0xbe
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff805c7c2a>] lockdep_sys_exit_thunk+0x35/0x67
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff8039c9ac>] cap_file_fcntl+0x0/0x3
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff802a61da>] fget+0x0/0xbe
<Oct/15 05:50 pm>[824473.329415]  [<ffffffff8020be2b>] system_call_fastpath+0x16/0x1b
<Oct/15 05:50 pm>[824473.329415] 
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff802296d8>] tg_nop+0x0/0x6
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff8022adb5>] walk_tg_tree+0x90/0xbc
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff8022ad25>] walk_tg_tree+0x0/0xbc

<Oct/15 05:50 pm>[824473.077977]  [<ffffffff8022e6b0>] rebalance_domains+0x13b/0x3f7
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff80231990>] run_rebalance_domains+0x42/0xd8
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff80239d7a>] __do_softirq+0x5e/0xcd
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff8020d17c>] call_softirq+0x1c/0x28
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff8020e82a>] do_softirq+0x2c/0x68
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff80239cd6>] irq_exit+0x3f/0x85
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff8021c2a1>] smp_apic_timer_interrupt+0x8e/0xa9
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff8020cbc6>] apic_timer_interrupt+0x66/0x70
<Oct/15 05:50 pm>[824473.077977]  <EOI>  [<ffffffff802123da>] c1e_idle+0x0/0xe9
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff805ca841>] __atomic_notifier_call_chain+0x0/0x83
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff802122c5>] default_idle+0x27/0x3b
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff802124bf>] c1e_idle+0xe5/0xe9
<Oct/15 05:50 pm>[824473.077977]  [<ffffffff8020abde>] cpu_idle+0x88/0xae
<Oct/15 05:50 pm>[824473.077977] 
<Oct/15 05:51 pm>[824473.022141] BUG: spinlock lockup on CPU#3, ppImage/27440, ffff88013624a100
<Oct/15 05:51 pm>[824473.022141] Pid: 27440, comm: ppImage Tainted: G      D   2.6.27-rc8-00016-gd3a47e8 #1
<Oct/15 05:51 pm>[824473.022141] 
                 <Oct/15 05:51 pm>[824473.022141] Call Trace:
<Oct/15 05:51 pm>[824473.022141]  <IRQ>  [<ffffffff803be0c4>] _raw_spin_lock+0xf9/0x120
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff805c8248>] _spin_lock_irqsave+0x37/0x3f
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80231d47>] tg_shares_up+0xcf/0x195
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80231d47>] tg_shares_up+0xcf/0x195

<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80231c78>] tg_shares_up+0x0/0x195
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff802296d8>] tg_nop+0x0/0x6
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8022adb5>] walk_tg_tree+0x90/0xbc
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8022ad25>] walk_tg_tree+0x0/0xbc
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8022e3c2>] try_to_wake_up+0xb0/0x238
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80563ff6>] inet_sk_rebuild_header+0x19/0x31a
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80247b33>] autoremove_wake_function+0x9/0x2e
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80229d46>] __wake_up_common+0x41/0x74
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8022a926>] __wake_up+0x38/0x4f
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80520bb0>] sock_def_wakeup+0x3c/0x48
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8055b670>] tcp_init_congestion_control+0x19/0xc5
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8055250a>] tcp_rcv_state_process+0x2eb/0xa28
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff805587d8>] tcp_v4_do_rcv+0x161/0x1b7
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80559e93>] tcp_v4_rcv+0x263/0x654
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8055a231>] tcp_v4_rcv+0x601/0x654
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80540d01>] ip_local_deliver+0xcb/0x164
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80540c91>] ip_local_deliver+0x5b/0x164
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80541294>] ip_rcv+0x4fa/0x552
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8052904b>] netif_receive_skb+0x24f/0x298
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80528f03>] netif_receive_skb+0x107/0x298
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8052b795>] process_backlog+0x71/0xbe
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8052b07e>] net_rx_action+0xc5/0x1fb

<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8052b010>] net_rx_action+0x57/0x1fb
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80239d7a>] __do_softirq+0x5e/0xcd
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8021d70b>] ack_apic_level+0x12/0xd3
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8020d17c>] call_softirq+0x1c/0x28
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8020e82a>] do_softirq+0x2c/0x68
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff80239cd6>] irq_exit+0x3f/0x85
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8020e9ca>] do_IRQ+0x13c/0x15e
<Oct/15 05:51 pm>[824473.022141]  [<ffffffff8020c3f1>] ret_from_intr+0x0/0xa
<Oct/15 05:51 pm>[824473.022141]  <EOI> 
Comment 60 Joshua Hoblitt 2008-10-16 13:55:53 UTC
Is it worth while to try disabling CONFIG_PCI_MSI ?
Comment 61 Cyrill Gorcunov 2008-10-16 22:17:12 UTC
iirc you were turning msi off with boot option so I'm not sure but you could try (to be sure). I think analyzing the dump above could help -- will try on holidays. Thanks!
Comment 62 Cyrill Gorcunov 2008-10-18 02:37:56 UTC
heh -- your OOPs looks similar to http://lkml.org/lkml/2008/5/13/212
checking...
Comment 63 Cyrill Gorcunov 2008-10-18 03:13:47 UTC
I mean having CR2: 0000000000000158 is quite strange.
Comment 64 Cyrill Gorcunov 2008-10-18 03:15:44 UTC
Do you have any KVM related stuff compiled in?
Comment 65 Joshua Hoblitt 2008-10-18 03:19:26 UTC
It's compiled in but not in use.
Comment 66 Cyrill Gorcunov 2008-10-18 03:20:40 UTC
I suppose it will be eliminated if CONFIG_SCHED_GROUP disabled. It seems to be CFS related in which area I have no experience.
Comment 67 Cyrill Gorcunov 2008-10-18 08:00:35 UTC
Actually Dmitry suggested a debugging patch for this case
http://lkml.org/lkml/2008/5/14/91
so you may try it (from what I've read this bug is still actual for mainline)
Comment 68 Joshua Hoblitt 2008-10-18 11:44:14 UTC
Removing KVM isn't a big deal as we're not currently using it but we would feel losing CONFIG_SCHED_GROUP.  We are running .27-rc8 from mainland now since what we needed from netdev-2.6 has been merged.

If this is indeed a CFS bug, I'd like to hear Ingo's or Thomoas' thoughts on it before trying that patch.
Comment 69 Cyrill Gorcunov 2008-10-18 12:11:41 UTC
It could be not just CFS related but complex of CFS and hrtimers and hotplug. The Dmitry's patch is just to show CFS status if pick_next_task_fair fails.
Ingo, Thomas?
Comment 70 Cyrill Gorcunov 2008-10-19 08:57:45 UTC
I've asked Peter Zijlkstra about probability of NULL deref in pick_next_task_fair which is not supposed to happen. So I don't know how it happens. My only suspicious is that happens like a side effect of some hotplug problem. Maybe a cpu core goes to sleep but don't update working structures and that leads to this NULL deref. Not sure Joshua. I wish I could help but it seems my knowledge is obviously not enough :(
Comment 71 Cyrill Gorcunov 2008-10-28 10:52:05 UTC
Hi Joshua, any news from practice? Same bugs? Did you try new shining kernel maybe?
Comment 72 Cyrill Gorcunov 2008-11-02 11:11:54 UTC
Joshua could you confirm if the problem is still here? I know it's boring -- but maybe you could try latest kernel (2.6.28-rc2)?
Comment 73 Joshua Hoblitt 2008-11-02 12:00:05 UTC
The problem is still there in .27.  I'm ok with trying to new kernels but it's takes several days to roll one out to the production cluster so I'd rather be testing some specific changeset that might address the issue.
Comment 74 Cyrill Gorcunov 2008-11-02 12:16:42 UTC
Joshua does it mean that problem is in picking new task? I mean -- the OOPs message is the same as last you posted (comment #59)?
Comment 75 Cyrill Gorcunov 2008-11-03 08:06:56 UTC
Joshua are you here? :) What is the most reproductible problem? The same as #59?
Comment 76 Rafael J. Wysocki 2008-11-09 13:13:28 UTC
Handled-By : Cyrill Gorcunov <gorcunov@gmail.com>
Comment 77 Joshua Hoblitt 2008-11-13 13:50:26 UTC
Created attachment 18855 [details]
kernel panics from 2008-10-17 -> 2008-11-13
Comment 78 Joshua Hoblitt 2008-11-13 13:55:26 UTC
I'm sorry for the long lag in any response, I was out of the office for ~3 weeks.

Summary of the last several weeks:  We are still running 2.6.27-rc8 on 32 nodes in production. In the time period 2008-10-17 -> 2008-11-13 there were ~12 kernel panics with a backtrace (just added to the bug as an attachment) and another 19 deadlocks with either no output at all or something like this on the console:

do_IRQ: 0.179 No irq handler for vector
do_IRQ: 2.187 No irq handler for vector

The deadlocks/panics are most definitely triggered by heavy load and past a certain point, even if the load is removed, the system will have a run away load average (20-30+) and then crash.  Also worth noting is that some nodes have never experienced this mode of failure while others seem to do it every few days.
Comment 79 Cyrill Gorcunov 2008-11-14 02:15:12 UTC
Hi Joshua, thanks for data! Will take a look as only get spare time. Btw could you attach a .config file please? (if it's not a secret :) I suppose all the problematic machines has the same config?
Comment 80 Cyrill Gorcunov 2008-11-17 00:03:56 UTC
well as I see there is a series of problems. First - scheduler related deadlock by tg_share_up, and interrupt missing handlers and seems in zone memory management too :). Joshua could you turn on lock debugging at first?
Comment 81 Joshua Hoblitt 2008-11-17 11:51:00 UTC
Created attachment 18890 [details]
2.6.27-rc8 ,config
Comment 82 Joshua Hoblitt 2008-11-17 11:54:41 UTC
> ------- Comment #80 from gorcunov@gmail.com  2008-11-17 00:03 -------
> well as I see there is a series of problems. First - scheduler related
> deadlock
> by tg_share_up, and interrupt missing handlers and seems in zone memory
> management too :). Joshua could you turn on lock debugging at first?

This is a chunk of the current .config:

--
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
# CONFIG_PROVE_LOCKING is not set
CONFIG_LOCKDEP=y
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_LOCKDEP is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
--

Which lock debugging option would you like to have enabled?  I believe
that enabling CONFIG_PROVE_LOCKING in the past made the problem almost
completely unreproducable but we may be fighting an entirely different
problem now.

-J

--
Comment 83 Cyrill Gorcunov 2008-11-17 12:04:34 UTC
CONFIG_DEBUG_LOCKDEP i think. Thanks for config! At least I would ask Peter to take a look (even a glance) on tg_share_up with fool lock chain taken. IIRC there was some fixes in sched area since -rc8 but I'm not following sched code that much (unfortunatelly).
Comment 84 Anonymous Emailer 2008-11-17 12:24:27 UTC
Reply-To: josh@hoblitt.com

> ------- Comment #83 from gorcunov@gmail.com  2008-11-17 12:04 -------
> CONFIG_DEBUG_LOCKDEP i think. Thanks for config! At least I would ask Peter
> to
> take a look (even a glance) on tg_share_up with fool lock chain taken. IIRC
> there was some fixes in sched area since -rc8 but I'm not following sched
> code
> that much (unfortunatelly).

I'm making a build of 2.6.28-rc5 w/ CONFIG_DEBUG_LOCKDEP to try but it
may be awhile before there are any results due to the intermittancy of
these crashes...

Thanks,

-J

--
Comment 85 Cyrill Gorcunov 2008-11-17 12:30:05 UTC
It's ok -- I think we may wait :)
Comment 86 Rafael J. Wysocki 2008-12-08 10:34:00 UTC
It appears we don't have sufficient information to find the root cause of the problem and we're not likely to collect it any time soon.

Closing the bug, please reopen if necessary.

Note You need to log in before you can comment on or make changes to this bug.