Bug 13991

Summary: e100 Freezes Kernel-2.6.30
Product: Drivers Reporter: Roger (rogerx.oss)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: CLOSED DUPLICATE    
Severity: blocking CC: devzero, rogerx.oss
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30.4 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
lspci-2.6.29
dmesg-2.6.29
lspci-2.6.30.log
dmesg-2.6.30.log

Description Roger 2009-08-16 00:39:05 UTC
e100.c now appears to hard freeze the 2.6.30 kernel.  (All 2.6.30.* kernels have had problems here, but finally got around to somewhat isolating this within the past few days using 2.6.30.4.)

I caught this one time, just prior to a freeze:
Aug 14 16:52:33 localhost2 e100: eth0 NIC Link is Down
Aug 14 16:52:35 localhost2 e100: eth0 NIC Link is Up 100 Mbps Half Duplex
Aug 14 16:53:03 localhost2 e100: eth0 NIC Link is Down
Aug 14 16:53:07 localhost2 e100: eth0 NIC Link is Up 100 Mbps Half Duplex
Aug 14 17:01:02 localhost2 e100: eth0 NIC Link is Down
Aug 14 17:01:14 localhost2 e100: eth0 NIC Link is Up 100 Mbps Half Duplex
Aug 14 17:03:16 localhost2 e100: eth0 NIC Link is Down
Aug 14 17:03:18 localhost2 e100: eth0 NIC Link is Up 100 Mbps Half Duplex


All other times, there has been absolutely no debug info printed to logs prior to freeze.  (In other words, it's so bad, debug printk isn't making it to the log file.)

Even serial tty console would freeze.

Not tainted.

I then acquired 16+ hours of uptime last night & today after compiling e100.c as module and then blacklisting it from loading.  Granted, this prevents quite a few init services from starting, but due to all the patching since 2.6.29, I'm safely concluding the cause is within e100.

I've briefly experimented with kgdb, but couldn't get it working over serial.  If anybody has any other pointers to catching this severe of a bug, let me know.  I've already compiled 2.6.30.4 with fomit-framepointers & more debug info -g including kgdb -- all of which, isn't giving me anymore in syslog or kgdb isn't functioning yet here.

(If I get time, I'll simply copy over e100.c from 2.6.29 into 2.6.30 and fully test. But this will not be likely until winter starts here.)
Comment 1 Roland Kletzing 2009-08-16 10:24:56 UTC
thanks for the report/analysis.

could you give information about your system (dmesg, lspci/-vv) ?

do you use another nic instead e100 now or do you use the system without networking?

mind that this could also be also pci bus lockups or whatever. we have a pile of other bug reports about system freeze with 2.6.30 kernels, so details about your system are important.
Comment 2 Roger 2009-08-19 04:31:06 UTC
Created attachment 22770 [details]
dmesg

I'll post a 2.6.30 version of both dmesg & lspci when I get another chance.  Currently, 2.6.30 is just locking up whenever it feels like. (Such as during boot!)
Comment 3 Roger 2009-08-19 04:32:20 UTC
Created attachment 22771 [details]
lspci-2.6.29

Gut feeling too, since my laptop e100 works fine but on an i815 chipset, it just might be PCI related.  But this is *all* just *guessing*!
Comment 4 Roger 2009-08-19 04:33:31 UTC
Created attachment 22772 [details]
dmesg-2.6.29
Comment 5 Roger 2009-08-19 15:33:53 UTC
Created attachment 22775 [details]
lspci-2.6.30.log

(Just did a search for e100 and found a ton of subjects with e100 lockups today.)
Comment 6 Roger 2009-08-19 15:34:26 UTC
Created attachment 22776 [details]
dmesg-2.6.30.log
Comment 7 Roland Kletzing 2009-08-21 17:44:43 UTC
can you please check with latest git to make sure this is NOT related to bug #13933 ( http://bugzilla.kernel.org/show_bug.cgi?id=13933 ) ?
Comment 8 Roger 2009-08-21 21:06:33 UTC
Thanks for the follow-up, but I all ready saw the latest notes on bug #13933.  I just got KGDB working here and seems it's a uid kobject call to the kernel (during network start), in which case does seem linked according to the code fix.

Looks like a domino affect, and if I get time, I'll check the latest GIT release, but will likely have to wait for a few days for the next minor patch release before I can test rather then have time for pulling the latest GIT.

Don't worry, if it does fix this bug, I'll close it asap.
Comment 9 Roger 2009-08-24 20:14:21 UTC
After 14:20 hours of uptime, I'm pretty sure now this is a duplicate of bug #13933. (http://bugzilla.kernel.org/show_bug.cgi?id=13933)

*** This bug has been marked as a duplicate of bug 13933 ***
Comment 10 Roger 2009-08-25 19:40:08 UTC
12+ more hours of consistent uptime with e100 statically compiled into the kernel.

... closing.