Bug 6780 - System freezes after 2 weeks when using SysKonnect SK-98xx based network card.
Summary: System freezes after 2 weeks when using SysKonnect SK-98xx based network card.
Status: REJECTED WILL_NOT_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-07-01 09:16 UTC by Arnoldas
Modified: 2006-07-22 14:03 UTC (History)
2 users (show)

See Also:
Kernel Version: tested from 2.6.11 up to 2.6.17.3
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Arnoldas 2006-07-01 09:16:29 UTC
Distribution: Debian sarge 3.1
Hardware Environment: not significant for this bug; tested on tens of different 
machines
Software Environment: Standart router with iptables firewall.
Problem Description: When the system has got any network card (tested with 
Marvell Technology Group Ltd. Yukon Gigabit Ethernet 10/100/1000Base-T Adapter 
(rev 13) and 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 10) 
network cards) which uses SysKonnect SK-98xx driver (CONFIG_SK98LIN=y) and that 
network card has got network cable and IP address set up (and there's 
activity), the machine freezes completely after 14-15 days of uptime.

When I connect keyboard to the machine, it doesn't react, even NumLock led does 
not turn on/off. The only solution is to press the reset button.

I've noticed, that each machine hangs up very accurately-exactly after 14-16 
days.

I am also sure, this is related to SK-98xx driver, because when I remove the 
mentioned network card and install another one, the problem goes away. 

Steps to reproduce: Compile a kernel supporting SK-98xx, boot it, set up the 
interface and just wait for about 2 weeks--the system should freeze.


Note: this might be a duplicate of #6277 (sorry, I'm tired of driving miles 
away from my home in order to reboot a couple of machines; besides that 
machines need stability-they're network routers).
Comment 1 Arnoldas 2006-07-01 09:18:53 UTC
Some more notes: those systems are not using any kind of shaping system 
(neither HTB, not CBQ), but problem still occurs.
Comment 2 Andrew Morton 2006-07-01 14:48:00 UTC
Strange.  If it was 47 days then I'd say it's due to a jiffies
rollover.  But nothing much happens after 14 days.  Possibly a packet
count rollover?

Do you have the NMI watchdog enabled?  Add `nmi_watchdog=1' to the
kernel boot command line.  That'll get us a trace if the machine
has any life at all left in it.  You'd need a serial console
or a digital camera to record it though.
Comment 3 Arnoldas 2006-07-01 23:39:30 UTC
I don't think this might be a packet count rollover, because I use that network 
cards on differently loaded machines (one transfers about 800 Gbs per day while 
other only about 10) and the problem still exists.

I don't have NMI watchdog enabled. Sorry, but I do not have neither a serial 
console nor a digital camera.
Comment 4 Arnoldas 2006-07-06 13:35:55 UTC
maybe there is any way/tool to reset device statistics manually without a 
reboot? I mean something like resetting /proc/interrupts, transmitted bytes and 
packets counters? Any ideas?
Comment 5 Stephen Hemminger 2006-07-13 09:36:01 UTC
Does the problem happen with the skge driver? The sk98lin driver
has a lot of messy private management code that is unnecessary.

The skge driver supports the same hardware, and is supported.
The sk98lin driver is not supported by the kernel community and
is planned to be obsoleted.
Comment 6 Arnoldas 2006-07-13 12:09:21 UTC
I'll try to test with the skge driver and then I'll post the results.
Comment 7 Arnoldas 2006-07-21 07:27:04 UTC
With this driver my systems does not freeze any more at all, the problem seems 
to be gone.

Comment 8 Stephen Hemminger 2006-07-22 14:03:59 UTC
Since skge supersedes sk98lin now, and the problem is probably
in the vendor MIB portion of the driver let's just accelerate removal
of the old driver.

Note You need to log in before you can comment on or make changes to this bug.