Bug 24902
Summary: | r8169 regression with lockups | ||
---|---|---|---|
Product: | Drivers | Reporter: | Jason Newton (nevion) |
Component: | Network | Assignee: | Francois Romieu (romieu) |
Status: | RESOLVED INSUFFICIENT_DATA | ||
Severity: | normal | CC: | akpm, alan, romieu |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.37rc5 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
lspci -vvv
dmesg |
Description
Jason Newton
2010-12-14 19:15:58 UTC
Created attachment 40162 [details]
lspci -vvv
This message is mostly a symptom : eth0 was not able to send for too long and the network device TX watchdog kicked in. It will not reboot the computer by itself. Can you try running the little script below ? #!/bin/sh while : ; do dir=/tmp/gloo/$(date +%Y%m%d%H%M%S) mkdir -p ${dir} cat /proc/interrupts > ${dir}/interrupts cat /proc/slabinfo > ${dir}/slab sync sleep 60 done Please check that your system logger does not operate asynchronously btw. Can you add a bit of context, say : - usual uptime with a 2.6.34 kernel - a short description of the network usage on both interfaces - MTU - complete dmesg, especially the XID line from the r8169 driver - no overclocking in sight ? I am not confortable with the proprietary fglrx module. It would be nice to reproduce the problem after a fresh boot without this module. -- Ueimor Created attachment 40202 [details]
dmesg
I am running the script now though I'm remote. Usual uptime with 2.6.34: 2-3 months, usually taken out by a power outage. eth0: main uplink. Constant trickle with a few hours of relatively intense (1mB+) usage every day. Problem occurs more often when in these intense times though that also happens to be when I'm using tte computer most. eth1: lan traffic, used to do alot of traffic all day (this machine serves as a gateway), lately only a few hours of 100-400kB traffic a day, if even. Both devices have an MTU of 1500 although upon checking just now eth0 was at 576 for some reason (this iface is dynamically configured) As for overclocking, yes this machine (an i7 920) is lightly overclocked but not overvolted or anything. Has been since I got it and I never have had any problems with your typical benchmarks or strange behaviors otherwise (superpi and memcheck have both worked for hours without any problems on top of kernel compiles and countless other workloads). I know it's good troubleshooting to turn it off but really, I think the probability that this is the culprit is insanely low. As for getting it to happen with fglrx, I'll see what I can do later tonight. I use opensuse 11.3 and syslog-ng - any way to check if I'm using async logging? Ok, this is a 8168d (r8169.c::RTL_GIGA_MAC_VER_25). Before looking any further, you should rebuild your r8169 module with the patches available at : - http://marc.info/?l=linux-netdev&m=129118104512684 - http://marc.info/?l=linux-netdev&m=129119732929951 -- Ueimor I found and applied V2 of net-r8169-Remove-the-firmware-of-RTL8111D.patch and the firmware adding patch, compiled and reloaded the module. I'll notify the next time it crashes. It seems it did it again 2 hours ago (I'm still at work), no crash log though so it's not for sure that problem still. I'll have to stress test it on the weekend or something with and without fglrx. A lockup with r8169.c::RTL_GIGA_MAC_VER_25 has been fixed in current -git kernel (see 1519e57fe81c14bb8fa4855579f19264d1ef63b4). Can you give it a try ? Current -git kernel includes (not for long) a nasty cast error but your 8168 revision can not notice it. -- Ueimor I've had a few sudden reboots in the interim, much lower chances of occuring on 2.6.37 from opensuse tumbleweed. It occurs alot more on the desktop flavour kernel than generic (desktop won't last the night, generic can last a month+). Don't really have time to test it out now, maybe in a few weeks. |