Bug 13219

Summary: Intel 440GX: Since kernel 2.6.30-rc1, computers hangs randomly but not with kernel <= 2.6.29.6
Product: Other Reporter: David Hill (hilld)
Component: OtherAssignee: other_other
Status: CLOSED CODE_FIX    
Severity: blocking CC: alan, bugs-a21, devzero, for.poige+bugzilla.kernel.org, hilld, jarausch, mcdebugger, rjw, robgri
Priority: P1    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.30-rc1-8, 2.6.30, 2.6.30.1-2.6.30.3, 2.6.31-rc3-rc5 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 13070    
Attachments: .config used to compile kernel <= 2.6.30
Netconsole logs...
Last trace available.
Netconsole logs of the crash... nothing exceptionnal.
NMI watchdog detected a lockup ...
2.6.30.2 crash
2.6.31-rc4
Still crashes... anybody want to debug this? :)

Description David Hill 2009-05-01 16:57:27 UTC
The screen remains black and nothing is visible in the log files. 

I can debug if needed, apply patches if required.
Comment 1 David Hill 2009-05-10 06:24:32 UTC
Samething with 2.6.30-rc5.

The computer locks up after a while.

Doesn't do it with 2.6.29.1 and 2.6.29.2.

I'm trying kernel 2.6.29.3 and hopefully it will work just ok.

What can I send you or do to help debugging this issue?
Comment 2 David Hill 2009-05-10 06:32:46 UTC
Linux version 2.6.29.3 (root@wolfe) (gcc version 4.3.3 (Debian 4.3.3-8) ) #1 SMP PREEMPT Sat May 9 12:15:05 EDT 2009
Comment 3 David Hill 2009-05-18 06:29:31 UTC
I enabled some debugging in the kernel... but no stack dump on the screen and still nothing in the log files.
Comment 4 David Hill 2009-05-18 06:51:29 UTC
Same problem with 2.6.30-rc6... computer freezes randomly.  

Bug not present in 2.6.29.3.
Comment 5 David Hill 2009-05-25 04:06:07 UTC
Samething with kernel 2.6.30-rc7
Comment 6 Rafael J. Wysocki 2009-05-25 21:02:11 UTC
On Monday 25 May 2009, David Hill wrote:
> Samething with kernel 2.6.30-rc7
> ;(
> 
> still crashes with no logs an no kernel dump... no nothing... simply freezes
> 
> ----- Original Message ----- 
> From: "Rafael J. Wysocki" <rjw@sisk.pl>
> To: "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
> Cc: "Kernel Testers List" <kernel-testers@vger.kernel.org>; "David Hill" 
> <hilld@binarystorm.net>
> Sent: Sunday, May 24, 2009 3:11 PM
> Subject: [Bug #13219] Since kernel 2.6.30-rc1, computers hangs randomly ..
Comment 7 David Hill 2009-05-27 02:38:27 UTC
By the way, the computer freezes less randomly with rc7.  It freezes about 2 minutes after starting ... it's unusable. 

At least with previous versions, it was able to stay up for some hours (sometimes) .

Is there anything I can do to help debug this issue?
Comment 8 David Hill 2009-06-10 04:04:51 UTC
Samething with 2.6.30-rc8.
Computer freezes after a little while. (less than an hour)
Comment 9 David Hill 2009-06-11 04:07:55 UTC
Even kernel 2.6.30 crashes the system.
Comment 10 Igor M Podlesny 2009-06-16 02:09:30 UTC
(In reply to comment #9)
> Even kernel 2.6.30 crashes the system.

Yeah, it does crash my system also. When it happened the first time it was 2.6.30, the system was running GUI so I didn't have any chance to catch diagnostic messages on the screen. Just hang, cold reboot. Another hang caused by 2.6.29.4-rt18, also when in GUI... Both hangs happened since ~ 1 day uptime.

	Hardware:

-- CPU (in x86_64 mode): AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
-- NVIDIA (+ their driver)

P. S. Until it's fixed I'm stick to 2.6.29.4. ;-)
Comment 11 David Hill 2009-06-16 02:30:08 UTC
Samething here.  Glad to see I'm not the only one.


00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge
00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
00:0b.0 SCSI storage controller: Adaptec AIC-7896U2/7897U2
00:0b.1 SCSI storage controller: Adaptec AIC-7896U2/7897U2
00:0d.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 08)
00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 RL/VR AGP
Comment 12 Igor M Podlesny 2009-06-16 03:37:53 UTC
(In reply to comment #11)
> Samething here.  Glad to see I'm not the only one.

	My notebook runs it fine. But it lacks of LVM-2 and DM/MD-raid I'm using at desktop (which 2.6.30 hangs). Can you mail me/post here your .config?
Comment 13 David Hill 2009-06-16 04:09:58 UTC
Created attachment 21929 [details]
.config used to compile kernel <= 2.6.30
Comment 14 Igor M Podlesny 2009-06-16 04:16:21 UTC
(In reply to comment #13)
> Created an attachment (id=21929) [details]
> .config used to compile kernel <= 2.6.30

It seems it's not related neither to MD-raid, nor to LVM-2. I guess the only real possibility to catch that bug is either (git's) patch bisection or hunting down with serial console attached...
Comment 15 David Hill 2009-06-16 04:33:36 UTC
I don't have the serial cable/USB thing to do the serial console ...
I guess the first case is the only one available.  But, to do so, it will take ages unless we have an idea (a slight one would be great) of which modification causes this hanging issue.

The only common point we have is the SMP thing and the GUI (I guess X.org?) ...
Comment 16 Igor M Podlesny 2009-06-16 04:44:52 UTC
(In reply to comment #15)
> I don't have the serial cable/USB thing to do the serial console ...
> 

Try using the netconsole:

	http://www.mjmwired.net/kernel/Documentation/networking/netconsole.txt

It really helps a lot.
Comment 17 Alan 2009-06-16 08:18:29 UTC
Igor. If you are using the binary Nvidia driver please try removing it (and the module it adds), rebooting so none of that code is loaded and using the nv or vesa driver for a bit. There may well be incompatibilities between it and 2.6.30 - which is something only Nvidia can tell or fix.
Comment 18 Alan 2009-06-16 08:20:21 UTC
Tagging with 440GX as that was in places quite an "interesting" chip so may be worth going back over the errata for anything we may now be triggering.
Comment 19 Igor M Podlesny 2009-06-16 09:28:34 UTC
(In reply to comment #17)
> Igor. If you are using the binary Nvidia driver please try removing it (and
> the
> module it adds), rebooting so none of that code is loaded and using the nv or
> vesa driver for a bit. There may well be incompatibilities between it and
> 2.6.30 - which is something only Nvidia can tell or fix.

I'm afraid I can't afford it -- I use NVIDIA's "TwinView" and bothering with xinerama (or kinda) isn't such a fun, taking into consideration it's a multi-seat desktop with 2 NVIDIA's cards; I'm not its the only user. Honestly, I think it's not NVIDIA driver's fault. :-) I'll try to catch oops messages (if any) with netconsole, as soon as it'd be possible w/o interrupting my wife's workflow to avoid risks and shame of being told "Your Linux hangs badly as Windows does!". :-)
Comment 20 Alan 2009-06-16 09:44:45 UTC
If you have the Nvidia stuff loaded its very unlikely any trace will be useful to anyone but Nvidia I'm afraid
Comment 21 Igor M Podlesny 2009-06-16 09:53:59 UTC
(In reply to comment #20)
> If you have the Nvidia stuff loaded its very unlikely any trace will be
> useful
> to anyone but Nvidia I'm afraid

You're dramatizing the situation. If, for e. g., it's a network bug, its traces are quite valuable in despite of using NVIDIA's driver. For e. g., I had hunted down one before: http://bugzilla.openvz.org/show_bug.cgi?id=1134 and I had been using NVIDIA's driver then. ;-P
Comment 22 David Hill 2009-06-16 23:26:45 UTC
If I enable NETCONSOLE in kernel 2.6.30, computer doesn't boot ... well it boots, starts some services but at some point, it totaly freezes.  I'm compiling it in 2.6.29.4 to see if this bug will still happen.
Comment 23 David Hill 2009-06-17 00:49:42 UTC
Created attachment 21950 [details]
Netconsole logs...

It freezes when the NIC is loaded/put in promiscuous mode.
This behavior isn't happening with 2.6.29.4.
Comment 24 David Hill 2009-06-17 01:06:53 UTC
Sorry to contradict myself, but NETCONSOLE make 2.6.29.4 also crash.

Could my problem be NIC related?
Comment 25 Igor M Podlesny 2009-06-17 01:08:30 UTC
(In reply to comment #23)
> Created an attachment (id=21950) [details]
> Netconsole logs...
> 
> It freezes when the NIC is loaded/put in promiscuous mode.
> This behavior isn't happening with 2.6.29.4.

David, it seems it'd be better to create another bugzilla's entry for that issue with netconsole...
Comment 26 David Hill 2009-06-17 01:08:31 UTC
Created attachment 21951 [details]
Last trace available.

NIC report "[eth1] : lost link" and computer freezes...
Comment 27 Igor M Podlesny 2009-06-18 11:29:42 UTC
(In reply to comment #23)
> Created an attachment (id=21950) [details]
> Netconsole logs...
> 
> It freezes when the NIC is loaded/put in promiscuous mode.
> This behavior isn't happening with 2.6.29.4.

	It's not happening with 2.6.29.4-rt18 also, so I've managed to start its testing under netconsole monitoring.
Comment 28 mcdebugger 2009-06-20 06:46:06 UTC
The same problem. Both with nvidia and ATI cards (opensource drivers) on the same board.
2.6.29 works fine (proprietary drivers too).
Comment 29 Igor M Podlesny 2009-06-20 07:08:24 UTC
(In reply to comment #28)
> The same problem. Both with nvidia and ATI cards (opensource drivers) on the
> same board.
> 2.6.29 works fine (proprietary drivers too).

Heh, I knew some kernel devs were rather biased against NVIDIA's "closed sources".
Comment 30 Igor M Podlesny 2009-07-06 04:11:42 UTC
Hi, David! Have you tried 2.6.30.1? I think there's a chance it could work ok.
Comment 31 Helmut Jarausch 2009-07-10 09:14:39 UTC
(In reply to comment #30)
> Hi, David! Have you tried 2.6.30.1? I think there's a chance it could work
> ok.

The same problem here with 2 older SMP machines (IDE disks) but different
controllers (one from serverworks and pdc202xx_new controller)

This is with gentoo's patched kernel 2.6.30-rc2 (from July, 4th)

2.6.29-r5 work just fine on those otherwise identical machines.

2.6.30-rc2  works just fine on recent hardware (quadcore AMD64 + SATA drivers).

It seems to be related to heavy writing to disk. When the system does only
bulk reads those systems have been stable for some days at least.
Comment 32 Igor M Podlesny 2009-07-10 09:18:20 UTC
test (please ignore this comment)
Comment 33 Igor M Podlesny 2009-07-10 09:20:34 UTC
(In reply to comment #31)
> (In reply to comment #30)
> > Hi, David! Have you tried 2.6.30.1? I think there's a chance it could work
> ok.
> 
> The same problem here with 2 older SMP machines (IDE disks) but different
> controllers (one from serverworks and pdc202xx_new controller)
[...]

	Helmut, replying to my comment is kinda wrong...
Comment 34 David Hill 2009-07-16 05:35:29 UTC
I'm trying 2.6.30.1 ... will get back shortly.
Comment 35 David Hill 2009-07-16 12:48:53 UTC
Samething, the computer crashses with 2.6.30.1.
Comment 36 David Hill 2009-07-17 01:25:10 UTC
I'm not quite sure it's the problem, but I think the following commands make the computer crash:

mii-diag -A 100baseTx-FD eth1
mii-diag -F 100baseTx-FD eth1
ethtool -s eth1 speed 100 duplex full autoneg on
Comment 37 David Hill 2009-07-17 02:24:16 UTC
The computer is still up.  Will give some feedback tomorrow morning.
Comment 38 David Hill 2009-07-17 02:26:23 UTC
00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)

This is eth1 ...
Comment 39 David Hill 2009-07-17 03:47:31 UTC
Well, there's too bug that are making me perplex...

One with ethtool/miidiag that crashes the NIC or the computer.
And the actual one where the computer randomly crashes.

And as Helmut Jarausch says, it seems to be related to heavy read/write on the hard disks... 

Need more help to debug this issue.

2.6.31-rc3 is a no go for me ...
Comment 40 David Hill 2009-07-17 03:49:55 UTC
Created attachment 22384 [details]
Netconsole logs of the crash... nothing exceptionnal.
Comment 41 David Hill 2009-07-17 05:45:49 UTC
Created attachment 22385 [details]
NMI watchdog detected a lockup ... 

I enabled the NMI watchdog since it doesn't seem to be enabled by default ... and it detected a lockup .

What I did was:

1) start a backup with tar
2) start a virus scan with fpscan (fprot)
3) system crashed after less than 5 minutes.
Comment 42 David Hill 2009-07-23 01:00:52 UTC
Created attachment 22459 [details]
2.6.30.2 crash
Comment 43 David Hill 2009-07-24 01:45:03 UTC
I now have 1:14:32 uptime with 2.6.31-rc4...
Will keep you posted! :D
Comment 44 David Hill 2009-07-24 02:13:17 UTC
Well, it crashed! :'(
Comment 45 David Hill 2009-07-24 02:17:15 UTC
Created attachment 22478 [details]
2.6.31-rc4

Crashed again! :'(
Comment 46 David Hill 2009-07-29 02:05:02 UTC
Samething with 2.6.30.3
Comment 47 David Hill 2009-07-29 04:33:35 UTC
Created attachment 22521 [details]
Still crashes... anybody want to debug this? :)
Comment 48 Igor M Podlesny 2009-07-29 05:01:17 UTC
(In reply to comment #47)
> Created an attachment (id=22521) [details]
> Still crashes... anybody want to debug this? :)

2.6.18-RHEL is your friend. :-)
Comment 49 David Hill 2009-07-29 12:18:03 UTC
well, 26.(In reply to comment #48)
> (In reply to comment #47)
> > Created an attachment (id=22521) [details] [details]
> > Still crashes... anybody want to debug this? :)
> 
> 2.6.18-RHEL is your friend. :-)

2.6.29.4 is also my friend... :)

I'll be trying 2.6.29.5 and 2.6.29.6 ... but I'm not sure it's the good path to follow.  Is it?
Comment 50 Igor M Podlesny 2009-07-29 13:44:05 UTC
(In reply to comment #49)
[…]
> I'll be trying 2.6.29.5 and 2.6.29.6 ... but I'm not sure it's the good path
> to
> follow.  Is it?

It's at least something! :-) Then, later you'd be able to use patch bisecting to narrow it and hunt it down finally. Good luck!
Comment 51 David Hill 2009-07-30 03:04:31 UTC
Kernel 2.6.29.5 seems to work fine.  I've been able to reach 6 hours uptime ... and it didn't crash.
Comment 52 David Hill 2009-07-30 04:54:48 UTC
Kernel 2.6.29.6 seems to work fine.  1h45 uptime ...
Comment 53 David Hill 2009-07-31 04:21:02 UTC
Ok, so now what?
patch bisecting is done how exactly?

I look at the git commit in git.kernel.org and there's lots of patches...  if I need to try each of them, with a kernel compilation time of 45 minutes, I'll still be here in 2000 years :/ 

Is there a faster way to narrow it down?

Thanks
Comment 54 David Hill 2009-07-31 04:41:25 UTC
Do I need to start from 2.6.29.4 and apply the patch-2.6.30-rc1-git1 found in ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots/?

The problem wasn't there with 2.6.29.4, 2.6.29.5 and 2.6.29.6.
It appeared in 2.6.30 and is present in all the latest version.

I guess it branched from 2.6.29.4 to 2.6.30-rc1-git1 ... Is that right?

Thank you very much.

BTW, there's only 7 git patches in rc1 ... so, unless I do not understand anything about git and kernel (which may be the case) I'd only have 7 tests to do?

And dig in the broken one?
Thankk you very much!!!
Comment 55 David Hill 2009-07-31 05:01:46 UTC
Or do I need to start from 2.6.29 and apply the patch-2.6.30-rc1-git1?

Again, thankk you very much !:D
Comment 56 Igor M Podlesny 2009-07-31 05:48:29 UTC
(In reply to comment #53)
> Ok, so now what?
> patch bisecting is done how exactly?
> 
> I look at the git commit in git.kernel.org and there's lots of patches...  if
> I
> need to try each of them, with a kernel compilation time of 45 minutes, I'll
> still be here in 2000 years :/ 
> 
> Is there a faster way to narrow it down?
> 

	Well, as name "bi-secting" implies "secting" (dividing) in "bi" (two) parts, it works pretty fast -- 2 in power N grows exponentially, so that's how fast you can be done with bug-hunting.
	
	I've Googled for "git bisect" and here it is, the most promising article to help you:
	
	http://kerneltrap.org/node/11753
	
> Thanks

	Welcome. :-)
Comment 57 David Hill 2009-07-31 16:16:50 UTC
drivers/i2c/i2c-core.c: In function 'i2c_new_device':
drivers/i2c/i2c-core.c:285: warning: 'i2c_attach_client' is deprecated (declared at include/linux/i2c.h:434)
drivers/i2c/i2c-core.c: In function 'i2c_unregister_device':
drivers/i2c/i2c-core.c:312: warning: 'client_unregister' is deprecated (declared at include/linux/i2c.h:357)
drivers/i2c/i2c-core.c:313: warning: 'client_unregister' is deprecated (declared at include/linux/i2c.h:357)
drivers/i2c/i2c-core.c: In function 'i2c_del_adapter':
drivers/i2c/i2c-core.c:653: warning: 'detach_client' is deprecated (declared at include/linux/i2c.h:154)
drivers/i2c/i2c-core.c: In function 'i2c_register_driver':
drivers/i2c/i2c-core.c:719: warning: 'detach_client' is deprecated (declared at include/linux/i2c.h:154)
drivers/i2c/i2c-core.c: In function '__detach_adapter':
drivers/i2c/i2c-core.c:788: warning: 'detach_client' is deprecated (declared at include/linux/i2c.h:154)
drivers/i2c/i2c-core.c: In function 'i2c_attach_client':
drivers/i2c/i2c-core.c:869: warning: 'client_register' is deprecated (declared at include/linux/i2c.h:356)
drivers/i2c/i2c-core.c:870: warning: 'client_register' is deprecated (declared at include/linux/i2c.h:356)
drivers/i2c/i2c-core.c: At top level:
drivers/i2c/i2c-core.c:884: warning: 'i2c_attach_client' is deprecated (declared at drivers/i2c/i2c-core.c:835)
drivers/i2c/i2c-core.c:884: warning: 'i2c_attach_client' is deprecated (declared at drivers/i2c/i2c-core.c:835)
drivers/i2c/i2c-core.c: In function 'i2c_detach_client':
drivers/i2c/i2c-core.c:891: warning: 'client_unregister' is deprecated (declared at include/linux/i2c.h:357)
drivers/i2c/i2c-core.c:892: warning: 'client_unregister' is deprecated (declared at include/linux/i2c.h:357)
drivers/i2c/i2c-core.c: At top level:
drivers/i2c/i2c-core.c:912: warning: 'i2c_detach_client' is deprecated (declared at drivers/i2c/i2c-core.c:887)
drivers/i2c/i2c-core.c:912: warning: 'i2c_detach_client' is deprecated (declared at drivers/i2c/i2c-core.c:887)
drivers/ide/ide-taskfile.c: In function 'ide_pio_bytes':
drivers/ide/ide-taskfile.c:230: warning: 'flags' may be used uninitialized in this function
drivers/pci/search.c:145: warning: 'pci_find_slot' is deprecated (declared at drivers/pci/search.c:134)
drivers/pci/search.c:145: warning: 'pci_find_slot' is deprecated (declared at drivers/pci/search.c:134)
drivers/pci/search.c:174: warning: 'pci_find_device' is deprecated (declared at drivers/pci/search.c:166)
drivers/pci/search.c:174: warning: 'pci_find_device' is deprecated (declared at drivers/pci/search.c:166)
WARNING: modpost: Found 5 section mismatch(es).
Comment 58 Rafael J. Wysocki 2009-08-05 13:27:52 UTC
On Wednesday 05 August 2009, David Hill wrote:
> yes it should still be listed...
> 
> ----- Original Message ----- 
> From: "Rafael J. Wysocki" <rjw@sisk.pl>
> To: "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>
> Cc: "Kernel Testers List" <kernel-testers@vger.kernel.org>; "David Hill" 
> <hilld@binarystorm.net>
> Sent: Sunday, August 02, 2009 3:09 PM
> Subject: [Bug #13219] Intel 440GX: Since kernel 2.6.30-rc1, computers hangs 
> randomly but not with kernel <= 2.6.29.6
> 
> 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.29 and 2.6.30.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.29 and 2.6.30.  Please verify if it still should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13219
> > Subject : Intel 440GX: Since kernel 2.6.30-rc1, computers hangs randomly 
> > but not with kernel <= 2.6.29.6
> > Submitter : David Hill <hilld@binarystorm.net>
> > Date : 2009-05-01 16:57 (94 days old)
Comment 59 Kris Karas 2009-08-05 18:41:38 UTC
This bug also affects older systems with an Intel 440BX chipset.

I have two production servers with Asus P2B-D motherboards (dual Pentium-III, 440BX chipset) with DAC960 SCSI RAID and ATI 3D Rage Pro graphics (Mach64).  These are very stable systems that have been running 24x7x365 since 1999.
Or rather, they run very stably on any kernel from 2.4.x up through 2.6.29.6;
but kernels 2.6.30 through 2.6.30.4 cause lockups/freezes to occur.

The lockups appear to occur randomly, anywhere from several hours to several days following a reboot.  When the system locks, it locks hard; nothing is printed on the screen (no OOPS report), nothing gets written to disk (no log files), and handy debugging aids such as <Alt>-<SysRq>-<...> have no effect.

There was a fix posted circa 2.6.30.2 to fix a locking problem with graphics; and these two system both use the ATY framebuffer in console mode (there is no X).  However, that particular "fix" has no effect on this locking bug.
Comment 60 David Hill 2009-08-06 02:25:08 UTC
I also have an ATI card in my system.

I'm currently bisecting the kernel. I should be able to provide more info shortly.


13 steps huh?

it's like 13 hours of compilation time + reboot + wait until it crashes (or not) .  So it will take something like may days before I can come up with something.

I'll keep you posted.
Comment 61 David Hill 2009-08-06 03:36:47 UTC
I just rebooted with kernel v2.6.29-06560-g3c6fae6
[3c6fae67d026d57f64eb3da9c0d0e76983e39ae3] Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
Comment 62 David Hill 2009-08-07 02:37:48 UTC
I just rebooted with kernel vLinux 2.6.29-03241-ga841
[a8416961d32d8bb757bcbb86b72042b66d044510] Merge branch 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
Comment 63 Roland Kletzing 2009-08-10 18:41:13 UTC
possibly related issues: http://bugzilla.kernel.org/show_bug.cgi?id=13933
Comment 64 David Hill 2009-08-10 19:04:38 UTC
My system is also dual P3 500MHZ...
I'm currently bisecting this bug!
I'll keep you posted...

Thank you
Comment 65 Igor M Podlesny 2009-08-10 19:08:42 UTC
(In reply to comment #64)
[...]
> I'm currently bisecting this bug!
[...]

Guys, BTW, you could have collaborated on testing bisected versions. :-) 2 or 3 testers would speed up the process significantly.
Comment 66 David Hill 2009-08-10 19:55:26 UTC
How do we do this?  It's my first time!!!
Comment 67 Igor M Podlesny 2009-08-10 19:58:28 UTC
(In reply to comment #66)
> How do we do this?  It's my first time!!!

	Just the same way you're already doing that -- compile several bisected versions at once (using different .configs) and hand it to other testers.
Comment 68 David Hill 2009-08-12 00:54:28 UTC
(In reply to comment #62)
> I just rebooted with kernel vLinux 2.6.29-03241-ga841
> [a8416961d32d8bb757bcbb86b72042b66d044510] Merge branch 'irq-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

This one is good!
Comment 69 David Hill 2009-08-12 01:36:55 UTC
I just rebooted with kernel v2.6.29-05048-g0fe41b8 ...
[0fe41b8982001cd14ee2c77cd776735a5024e98b] Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm
Comment 70 David Hill 2009-08-14 04:45:02 UTC
Still bisecting ...  8 steps (estimated) remaining.
Comment 71 Roland Kletzing 2009-08-15 15:01:08 UTC
David, i just read about the eth1/RTL8139 issue.
is that card still in the system or even in use while you have the other crashes ? did you try remove that nic or disable it completely , just to see if this makes a difference ?
Comment 72 David Hill 2009-08-16 18:12:59 UTC
Remove it, no.
And this server is a gateway... Disabling the nic Will cut the  
Internet access!!!

But I'm still bisecting the kernel to find where the bug appears...

David Hill

On 2009-08-15, at 15:01, bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=13219
>
>
>
>
>
> --- Comment #71 from Roland Kletzing <devzero@web.de>  2009-08-15  
> 15:01:08 ---
> David, i just read about the eth1/RTL8139 issue.
> is that card still in the system or even in use while you have the  
> other
> crashes ? did you try remove that nic or disable it completely ,  
> just to see if
> this makes a difference ?
>
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
> You reported the bug.
>
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
Comment 73 Kris Karas 2009-08-17 14:38:15 UTC
The lockups I am seeing (on an ASUS P2B-D mobo, 440BX chipset) are not related to Realtek ethernet.  Both of my systems have a 3c509 (single nic) and an Intel EtherExpress Pro 100 (dual nic) cards in them.
Comment 74 David Hill 2009-08-17 15:23:35 UTC
Great news ... or bad ones?!

David Hill

On 2009-08-17, at 14:38, bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=13219
>
>
>
>
>
> --- Comment #73 from Kris Karas <ktk@enterprise.bidmc.harvard.edu>   
> 2009-08-17 14:38:15 ---
> The lockups I am seeing (on an ASUS P2B-D mobo, 440BX chipset) are  
> not related
> to Realtek ethernet.  Both of my systems have a 3c509 (single nic)  
> and an Intel
> EtherExpress Pro 100 (dual nic) cards in them.
>
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
> You reported the bug.
>
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>
Comment 75 David Hill 2009-08-21 01:48:06 UTC
I'm not quite sure if my bisecting is good...

But if I have many consecutive bad, isn't it bad?



git bisect start
# good: [8e0ee43bc2c3e19db56a4adaa9a9b04ce885cd84] Linux 2.6.29
git bisect good 8e0ee43bc2c3e19db56a4adaa9a9b04ce885cd84
# bad: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
git bisect bad 07a2039b8eb0af4ff464efd3dfd95de5c02648c6
# bad: [3c6fae67d026d57f64eb3da9c0d0e76983e39ae3] Merge branch 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
git bisect bad 3c6fae67d026d57f64eb3da9c0d0e76983e39ae3
# good: [a8416961d32d8bb757bcbb86b72042b66d044510] Merge branch 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect good a8416961d32d8bb757bcbb86b72042b66d044510
# bad: [0fe41b8982001cd14ee2c77cd776735a5024e98b] Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm
git bisect bad 0fe41b8982001cd14ee2c77cd776735a5024e98b
# good: [9759d22c8348343b0da4e25d6150c41712686c14] Merge branch 'master' into devel
git bisect good 9759d22c8348343b0da4e25d6150c41712686c14
# bad: [881c47760bc66b43360337da37d2a9de4af865b0] Merge branch 'x86/cleanups' into x86/core
git bisect bad 881c47760bc66b43360337da37d2a9de4af865b0
# bad: [0a7e8c64142b2ae5aacdc509ed112b8e362ac8a4] x86, genapic: cleanup 32-bit apic_default template
git bisect bad 0a7e8c64142b2ae5aacdc509ed112b8e362ac8a4
# bad: [5cdc5e9e69d4dc3a3630ae1fa666401b2a8dcde6] x86: fully honor "nolapic", fix
git bisect bad 5cdc5e9e69d4dc3a3630ae1fa666401b2a8dcde6
# bad: [f10fcd47120e80f66665567dbe17f5071c7aef52] x86: make early_per_cpu() a lvalue and use it
git bisect bad f10fcd47120e80f66665567dbe17f5071c7aef52
Comment 76 Roland Kletzing 2009-08-21 17:50:06 UTC
can you please try latest git and confirm if the problem is fixed ?
( see http://bugzilla.kernel.org/show_bug.cgi?id=13933 )
Comment 77 Rafael J. Wysocki 2009-08-26 20:48:08 UTC
On Wednesday 26 August 2009, David Hill wrote:
> It seems to be fixed in 2.6.31-rc7
> 
> David Hill
> 
> On 2009-08-25, at 17:05, "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.29 and 2.6.30.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.29 and 2.6.30.  Please verify if it still  
> > should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry    : http://bugzilla.kernel.org/show_bug.cgi?id=13219
> > Subject        : Intel 440GX: Since kernel 2.6.30-rc1, computers  
> > hangs randomly but not with kernel <= 2.6.29.6
> > Submitter    : David Hill <hilld@binarystorm.net>
> > Date        : 2009-05-01 16:57 (117 days old)