Distribution: Debian sarge 3.1 Hardware Environment: not significant for this bug; tested on tens of different machines Software Environment: Standart router with iptables firewall. Problem Description: When the system has got any network card (tested with Marvell Technology Group Ltd. Yukon Gigabit Ethernet 10/100/1000Base-T Adapter (rev 13) and 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 10) network cards) which uses SysKonnect SK-98xx driver (CONFIG_SK98LIN=y) and that network card has got network cable and IP address set up (and there's activity), the machine freezes completely after 14-15 days of uptime. When I connect keyboard to the machine, it doesn't react, even NumLock led does not turn on/off. The only solution is to press the reset button. I've noticed, that each machine hangs up very accurately-exactly after 14-16 days. I am also sure, this is related to SK-98xx driver, because when I remove the mentioned network card and install another one, the problem goes away. Steps to reproduce: Compile a kernel supporting SK-98xx, boot it, set up the interface and just wait for about 2 weeks--the system should freeze. Note: this might be a duplicate of #6277 (sorry, I'm tired of driving miles away from my home in order to reboot a couple of machines; besides that machines need stability-they're network routers).
Some more notes: those systems are not using any kind of shaping system (neither HTB, not CBQ), but problem still occurs.
Strange. If it was 47 days then I'd say it's due to a jiffies rollover. But nothing much happens after 14 days. Possibly a packet count rollover? Do you have the NMI watchdog enabled? Add `nmi_watchdog=1' to the kernel boot command line. That'll get us a trace if the machine has any life at all left in it. You'd need a serial console or a digital camera to record it though.
I don't think this might be a packet count rollover, because I use that network cards on differently loaded machines (one transfers about 800 Gbs per day while other only about 10) and the problem still exists. I don't have NMI watchdog enabled. Sorry, but I do not have neither a serial console nor a digital camera.
maybe there is any way/tool to reset device statistics manually without a reboot? I mean something like resetting /proc/interrupts, transmitted bytes and packets counters? Any ideas?
Does the problem happen with the skge driver? The sk98lin driver has a lot of messy private management code that is unnecessary. The skge driver supports the same hardware, and is supported. The sk98lin driver is not supported by the kernel community and is planned to be obsoleted.
I'll try to test with the skge driver and then I'll post the results.
With this driver my systems does not freeze any more at all, the problem seems to be gone.
Since skge supersedes sk98lin now, and the problem is probably in the vendor MIB portion of the driver let's just accelerate removal of the old driver.