Bug 13553 - When NETCONSOLE is enabled in kernel, computer crashes after 120seconds (approx)
Summary: When NETCONSOLE is enabled in kernel, computer crashes after 120seconds (approx)
Status: RESOLVED UNREPRODUCIBLE
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Arnaldo Carvalho de Melo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-17 01:55 UTC by David Hill
Modified: 2011-06-29 00:23 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.29.4, 2.6.30
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description David Hill 2009-06-17 01:55:53 UTC

    
Comment 1 David Hill 2009-06-17 01:56:57 UTC
00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge
00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
00:0b.0 SCSI storage controller: Adaptec AIC-7896U2/7897U2
00:0b.1 SCSI storage controller: Adaptec AIC-7896U2/7897U2
00:0d.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 08)
00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 RL/VR AGP
Comment 2 David Hill 2009-06-17 02:55:56 UTC
With NETCONSOLE enabled, if I type:
ethtool -s eth1 speed 100 duplex full autoneg on

the computer freezes with kernel 2.6.29.4 and 2.6.30...

I can reproduce it anytime you want.
Comment 3 Andrew Morton 2009-06-23 21:08:06 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 17 Jun 2009 01:55:54 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=13553
> 
>            Summary: When NETCONSOLE is enabled in kernel, computer crashes
>                     after 120seconds (approx)
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.29.4, 2.6.30
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: hilld@binarystorm.net
>         Regression: No
> 
> 

> 00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
> 00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge
> 00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
> 00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
> 00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 01)
> 00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
> 00:0b.0 SCSI storage controller: Adaptec AIC-7896U2/7897U2
> 00:0b.1 SCSI storage controller: Adaptec AIC-7896U2/7897U2
> 00:0d.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100
> (rev 08)
> 00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL-8139/8139C/8139C+ (rev 10)
> 01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 RL/VR AGP
> 
> ------- Comment #2 From David Hill 2009-06-17 02:55:56 (-) [reply] -------
> 
> With NETCONSOLE enabled, if I type:
> ethtool -s eth1 speed 100 duplex full autoneg on
> 
> the computer freezes with kernel 2.6.29.4 and 2.6.30...
> 
> I can reproduce it anytime you want.
> 

Interesting.  I wonder what the significance is of the 120 seconds.  I
see no such timers in e100.c.  Does the networking core have timers on
such intervals?
Comment 4 Neil Horman 2009-06-24 01:05:05 UTC
On Tue, Jun 23, 2009 at 02:07:43PM -0700, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Wed, 17 Jun 2009 01:55:54 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=13553
> > 
> >            Summary: When NETCONSOLE is enabled in kernel, computer crashes
> >                     after 120seconds (approx)
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 2.6.29.4, 2.6.30
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: Other
> >         AssignedTo: acme@ghostprotocols.net
> >         ReportedBy: hilld@binarystorm.net
> >         Regression: No
> > 
> > 
> 
> > 00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
> > 00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge
> > 00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
> > 00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
> > 00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 01)
> > 00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
> > 00:0b.0 SCSI storage controller: Adaptec AIC-7896U2/7897U2
> > 00:0b.1 SCSI storage controller: Adaptec AIC-7896U2/7897U2
> > 00:0d.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro
> 100
> > (rev 08)
> > 00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> > RTL-8139/8139C/8139C+ (rev 10)
> > 01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 RL/VR AGP
> > 
> > ------- Comment #2 From David Hill 2009-06-17 02:55:56 (-) [reply] -------
> > 
> > With NETCONSOLE enabled, if I type:
> > ethtool -s eth1 speed 100 duplex full autoneg on
> > 
> > the computer freezes with kernel 2.6.29.4 and 2.6.30...
> > 
> > I can reproduce it anytime you want.
> > 
> 
> Interesting.  I wonder what the significance is of the 120 seconds.  I
> see no such timers in e100.c.  Does the networking core have timers on
> such intervals?
> 
My guess is the 120 seconds has less to do with the driver, and more to do with
some other periodic event in the kernel that triggers a message getting written
to the console, which in turn triggers whatever deadlock it is thats getting hit
here.  I imagine we could diagnose it pretty quick if a stack trace or vmcore
could be captured on this.  David, can you enable the NMI watchdog on this
system to trigger a panic on the system after a deadlock?  Then if you could
enable a second serial console, or setup kdump to capture a vmcore on this
system, we should be able to  figure out whats going on.  My guess is that in
the e100 driver we're taking a lock in the ethtool set path, then calling
printk, which winds up recursing into the driver, trying to take the same lock
again.  A stack trace will tell us for certain.

Regards
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Comment 5 David Hill 2009-07-16 05:42:22 UTC
Will try that in the next few days... sorry for the delay.  I was on 
vacation for the last 2 weeks and thus, out of town :D



----- Original Message ----- 
From: "Neil Horman" <nhorman@tuxdriver.com>
To: "Andrew Morton" <akpm@linux-foundation.org>
Cc: <netdev@vger.kernel.org>; <bugzilla-daemon@bugzilla.kernel.org>; 
<bugme-daemon@bugzilla.kernel.org>; <hilld@binarystorm.net>
Sent: Tuesday, June 23, 2009 9:05 PM
Subject: Re: [Bugme-new] [Bug 13553] New: When NETCONSOLE is enabled 
inkernel, computer crashes after 120seconds (approx)


> On Tue, Jun 23, 2009 at 02:07:43PM -0700, Andrew Morton wrote:
>>
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Wed, 17 Jun 2009 01:55:54 GMT
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>> > http://bugzilla.kernel.org/show_bug.cgi?id=13553
>> >
>> >            Summary: When NETCONSOLE is enabled in kernel, computer 
>> > crashes
>> >                     after 120seconds (approx)
>> >            Product: Networking
>> >            Version: 2.5
>> >     Kernel Version: 2.6.29.4, 2.6.30
>> >           Platform: All
>> >         OS/Version: Linux
>> >               Tree: Mainline
>> >             Status: NEW
>> >           Severity: high
>> >           Priority: P1
>> >          Component: Other
>> >         AssignedTo: acme@ghostprotocols.net
>> >         ReportedBy: hilld@binarystorm.net
>> >         Regression: No
>> >
>> >
>>
>> > 00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
>> > 00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge
>> > 00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
>> > 00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 
>> > 01)
>> > 00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 
>> > 01)
>> > 00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
>> > 00:0b.0 SCSI storage controller: Adaptec AIC-7896U2/7897U2
>> > 00:0b.1 SCSI storage controller: Adaptec AIC-7896U2/7897U2
>> > 00:0d.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet 
>> > Pro 100
>> > (rev 08)
>> > 00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>> > RTL-8139/8139C/8139C+ (rev 10)
>> > 01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 RL/VR 
>> > AGP
>> >
>> > ------- Comment #2 From David Hill 2009-06-17 02:55:56 (-) 
>> > [reply] -------
>> >
>> > With NETCONSOLE enabled, if I type:
>> > ethtool -s eth1 speed 100 duplex full autoneg on
>> >
>> > the computer freezes with kernel 2.6.29.4 and 2.6.30...
>> >
>> > I can reproduce it anytime you want.
>> >
>>
>> Interesting.  I wonder what the significance is of the 120 seconds.  I
>> see no such timers in e100.c.  Does the networking core have timers on
>> such intervals?
>>
> My guess is the 120 seconds has less to do with the driver, and more to do 
> with
> some other periodic event in the kernel that triggers a message getting 
> written
> to the console, which in turn triggers whatever deadlock it is thats 
> getting hit
> here.  I imagine we could diagnose it pretty quick if a stack trace or 
> vmcore
> could be captured on this.  David, can you enable the NMI watchdog on this
> system to trigger a panic on the system after a deadlock?  Then if you 
> could
> enable a second serial console, or setup kdump to capture a vmcore on this
> system, we should be able to  figure out whats going on.  My guess is that 
> in
> the e100 driver we're taking a lock in the ethtool set path, then calling
> printk, which winds up recursing into the driver, trying to take the same 
> lock
> again.  A stack trace will tell us for certain.
>
> Regards
> Neil
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
>
Comment 6 David Hill 2009-07-17 05:55:38 UTC
Hi back,
Look at bug 13219.  I'm not sure the bug is related to NETCONSOLE.
It may be with the NIC drivers or the tools miidiag/ethtool or anything 
else.
The behavior of the system is random.

I attached the NMI stack trace ... but for the kdump, I need to read a bit 
more about it and think I'll need to patch the kernel... will I ?

Thanks again,

Dave


----- Original Message ----- 
From: "David Hill" <hilld@binarystorm.net>
To: "Neil Horman" <nhorman@tuxdriver.com>; "Andrew Morton" 
<akpm@linux-foundation.org>
Cc: <netdev@vger.kernel.org>; <bugzilla-daemon@bugzilla.kernel.org>; 
<bugme-daemon@bugzilla.kernel.org>
Sent: Thursday, July 16, 2009 1:42 AM
Subject: Re: [Bugme-new] [Bug 13553] New: When NETCONSOLE is enabled 
inkernel, computer crashes after 120seconds (approx)


> Will try that in the next few days... sorry for the delay.  I was on 
> vacation for the last 2 weeks and thus, out of town :D
>
>
>
> ----- Original Message ----- 
> From: "Neil Horman" <nhorman@tuxdriver.com>
> To: "Andrew Morton" <akpm@linux-foundation.org>
> Cc: <netdev@vger.kernel.org>; <bugzilla-daemon@bugzilla.kernel.org>; 
> <bugme-daemon@bugzilla.kernel.org>; <hilld@binarystorm.net>
> Sent: Tuesday, June 23, 2009 9:05 PM
> Subject: Re: [Bugme-new] [Bug 13553] New: When NETCONSOLE is enabled 
> inkernel, computer crashes after 120seconds (approx)
>
>
>> On Tue, Jun 23, 2009 at 02:07:43PM -0700, Andrew Morton wrote:
>>>
>>> (switched to email.  Please respond via emailed reply-to-all, not via 
>>> the
>>> bugzilla web interface).
>>>
>>> On Wed, 17 Jun 2009 01:55:54 GMT
>>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>>
>>> > http://bugzilla.kernel.org/show_bug.cgi?id=13553
>>> >
>>> >            Summary: When NETCONSOLE is enabled in kernel, computer 
>>> > crashes
>>> >                     after 120seconds (approx)
>>> >            Product: Networking
>>> >            Version: 2.5
>>> >     Kernel Version: 2.6.29.4, 2.6.30
>>> >           Platform: All
>>> >         OS/Version: Linux
>>> >               Tree: Mainline
>>> >             Status: NEW
>>> >           Severity: high
>>> >           Priority: P1
>>> >          Component: Other
>>> >         AssignedTo: acme@ghostprotocols.net
>>> >         ReportedBy: hilld@binarystorm.net
>>> >         Regression: No
>>> >
>>> >
>>>
>>> > 00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
>>> > 00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge
>>> > 00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
>>> > 00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 
>>> > 01)
>>> > 00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 
>>> > 01)
>>> > 00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
>>> > 00:0b.0 SCSI storage controller: Adaptec AIC-7896U2/7897U2
>>> > 00:0b.1 SCSI storage controller: Adaptec AIC-7896U2/7897U2
>>> > 00:0d.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet 
>>> > Pro 100
>>> > (rev 08)
>>> > 00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>> > RTL-8139/8139C/8139C+ (rev 10)
>>> > 01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 RL/VR 
>>> > AGP
>>> >
>>> > ------- Comment #2 From David Hill 2009-06-17 02:55:56 (-) 
>>> > [reply] -------
>>> >
>>> > With NETCONSOLE enabled, if I type:
>>> > ethtool -s eth1 speed 100 duplex full autoneg on
>>> >
>>> > the computer freezes with kernel 2.6.29.4 and 2.6.30...
>>> >
>>> > I can reproduce it anytime you want.
>>> >
>>>
>>> Interesting.  I wonder what the significance is of the 120 seconds.  I
>>> see no such timers in e100.c.  Does the networking core have timers on
>>> such intervals?
>>>
>> My guess is the 120 seconds has less to do with the driver, and more to 
>> do with
>> some other periodic event in the kernel that triggers a message getting 
>> written
>> to the console, which in turn triggers whatever deadlock it is thats 
>> getting hit
>> here.  I imagine we could diagnose it pretty quick if a stack trace or 
>> vmcore
>> could be captured on this.  David, can you enable the NMI watchdog on 
>> this
>> system to trigger a panic on the system after a deadlock?  Then if you 
>> could
>> enable a second serial console, or setup kdump to capture a vmcore on 
>> this
>> system, we should be able to  figure out whats going on.  My guess is 
>> that in
>> the e100 driver we're taking a lock in the ethtool set path, then calling
>> printk, which winds up recursing into the driver, trying to take the same 
>> lock
>> again.  A stack trace will tell us for certain.
>>
>> Regards
>> Neil
>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> -- 
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>>
>>
>
Comment 7 Neil Horman 2009-07-17 14:16:07 UTC
On Fri, Jul 17, 2009 at 01:55:44AM -0400, David Hill wrote:
> Hi back,
> Look at bug 13219.  I'm not sure the bug is related to NETCONSOLE.
> It may be with the NIC drivers or the tools miidiag/ethtool or anything  
> else.
> The behavior of the system is random.
>
> I attached the NMI stack trace ... but for the kdump, I need to read a 
> bit more about it and think I'll need to patch the kernel... will I ?
>
> Thanks again,
>
> Dave
>
Neither of the logs you attached in the associated bugs seem to have the NMI
lockup backtrace included.  As for a kdump, you won't need to patch the kernel,
no, but depending on what kernel you're using, you may need to build the kernel
with CONFIG_CRASH and CONFIG_KEXEC turned on.

Neil

>
> ----- Original Message ----- From: "David Hill" <hilld@binarystorm.net>
> To: "Neil Horman" <nhorman@tuxdriver.com>; "Andrew Morton"  
> <akpm@linux-foundation.org>
> Cc: <netdev@vger.kernel.org>; <bugzilla-daemon@bugzilla.kernel.org>;  
> <bugme-daemon@bugzilla.kernel.org>
> Sent: Thursday, July 16, 2009 1:42 AM
> Subject: Re: [Bugme-new] [Bug 13553] New: When NETCONSOLE is enabled  
> inkernel, computer crashes after 120seconds (approx)
>
>
>> Will try that in the next few days... sorry for the delay.  I was on  
>> vacation for the last 2 weeks and thus, out of town :D
>>
>>
>>
>> ----- Original Message ----- From: "Neil Horman" 
>> <nhorman@tuxdriver.com>
>> To: "Andrew Morton" <akpm@linux-foundation.org>
>> Cc: <netdev@vger.kernel.org>; <bugzilla-daemon@bugzilla.kernel.org>;  
>> <bugme-daemon@bugzilla.kernel.org>; <hilld@binarystorm.net>
>> Sent: Tuesday, June 23, 2009 9:05 PM
>> Subject: Re: [Bugme-new] [Bug 13553] New: When NETCONSOLE is enabled  
>> inkernel, computer crashes after 120seconds (approx)
>>
>>
>>> On Tue, Jun 23, 2009 at 02:07:43PM -0700, Andrew Morton wrote:
>>>>
>>>> (switched to email.  Please respond via emailed reply-to-all, not 
>>>> via the
>>>> bugzilla web interface).
>>>>
>>>> On Wed, 17 Jun 2009 01:55:54 GMT
>>>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>>>
>>>> > http://bugzilla.kernel.org/show_bug.cgi?id=13553
>>>> >
>>>> >            Summary: When NETCONSOLE is enabled in kernel, 
>>>> computer > crashes
>>>> >                     after 120seconds (approx)
>>>> >            Product: Networking
>>>> >            Version: 2.5
>>>> >     Kernel Version: 2.6.29.4, 2.6.30
>>>> >           Platform: All
>>>> >         OS/Version: Linux
>>>> >               Tree: Mainline
>>>> >             Status: NEW
>>>> >           Severity: high
>>>> >           Priority: P1
>>>> >          Component: Other
>>>> >         AssignedTo: acme@ghostprotocols.net
>>>> >         ReportedBy: hilld@binarystorm.net
>>>> >         Regression: No
>>>> >
>>>> >
>>>>
>>>> > 00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
>>>> > 00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge
>>>> > 00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
>>>> > 00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE 
>>>> (rev > 01)
>>>> > 00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB 
>>>> (rev > 01)
>>>> > 00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
>>>> > 00:0b.0 SCSI storage controller: Adaptec AIC-7896U2/7897U2
>>>> > 00:0b.1 SCSI storage controller: Adaptec AIC-7896U2/7897U2
>>>> > 00:0d.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 
>>>> Ethernet > Pro 100
>>>> > (rev 08)
>>>> > 00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>> > RTL-8139/8139C/8139C+ (rev 10)
>>>> > 01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 
>>>> RL/VR > AGP
>>>> >
>>>> > ------- Comment #2 From David Hill 2009-06-17 02:55:56 (-) > 
>>>> [reply] -------
>>>> >
>>>> > With NETCONSOLE enabled, if I type:
>>>> > ethtool -s eth1 speed 100 duplex full autoneg on
>>>> >
>>>> > the computer freezes with kernel 2.6.29.4 and 2.6.30...
>>>> >
>>>> > I can reproduce it anytime you want.
>>>> >
>>>>
>>>> Interesting.  I wonder what the significance is of the 120 seconds.  I
>>>> see no such timers in e100.c.  Does the networking core have timers on
>>>> such intervals?
>>>>
>>> My guess is the 120 seconds has less to do with the driver, and more 
>>> to do with
>>> some other periodic event in the kernel that triggers a message 
>>> getting written
>>> to the console, which in turn triggers whatever deadlock it is thats  
>>> getting hit
>>> here.  I imagine we could diagnose it pretty quick if a stack trace 
>>> or vmcore
>>> could be captured on this.  David, can you enable the NMI watchdog on 
>>> this
>>> system to trigger a panic on the system after a deadlock?  Then if 
>>> you could
>>> enable a second serial console, or setup kdump to capture a vmcore on 
>>> this
>>> system, we should be able to  figure out whats going on.  My guess is 
>>> that in
>>> the e100 driver we're taking a lock in the ethtool set path, then calling
>>> printk, which winds up recursing into the driver, trying to take the 
>>> same lock
>>> again.  A stack trace will tell us for certain.
>>>
>>> Regards
>>> Neil
>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> -- 
>>> This message has been scanned for viruses and
>>> dangerous content by MailScanner, and is
>>> believed to be clean.
>>>
>>>
>>>
>>
>
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
Comment 8 David Hill 2010-02-28 04:55:49 UTC
I forgot this bug existed ... :S   Will try doing this.
Comment 9 David Hill 2010-02-28 05:00:43 UTC
And forget about the timer thing... it crashes only when I disconnect the ethernet cable (or reset the switch) ...
Comment 10 David Hill 2010-02-28 05:09:22 UTC
Ok, I'm not quite sure what you expect me to try... bug I guess I need to recompile my kernel with KEXEC=y (which is already the case) and enable CRASH_DUMP ... start the new kernel with kexec and unplug the ethernet adapter and attach the dump to this bug report... am I right?

Thank you very much.
Comment 11 David Hill 2011-06-29 00:23:23 UTC
You can close this bug report...
Comment 12 David Hill 2011-06-29 00:23:59 UTC
This is not reproducable and was induced by some other bugs back at that time.

Note You need to log in before you can comment on or make changes to this bug.