Bug 6747
Summary: | Network randomly freezes (eth1 usb cable modem) | ||
---|---|---|---|
Product: | Drivers | Reporter: | luminoso (luminoso) |
Component: | USB | Assignee: | Alan Stern (stern) |
Status: | CLOSED CODE_FIX | ||
Severity: | blocking | CC: | akpm, bunk, greg, jgarzik, spam, stern |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.17+ | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 5089 | ||
Attachments: |
kernel configuration file
Bugfix for uhci-hcd in 2.6.17.8 dmesg for bug 6747 from /sys/kernel/debug/usbmon/3t Try to detect changes to rx_frame_errors logs generated from patch 2nd log generated from patch usbnet rx-frame-errors test patch kernel.log output Log usbnet errors output of patch8908 Print rx header results of patch8912 Bugfix for uhci-hcd: skip to end of TD list during toggle fixup dmesg output |
Description
luminoso
2006-06-25 09:00:50 UTC
On Sun, 25 Jun 2006 09:02:03 -0700 bugme-daemon@bugzilla.kernel.org wrote: > Most recent kernel where this bug did not occur:2.6.17.1 vanilla I think you mean it _did_ occur in 2.6.17.1. That question is asking what is the most recent kernel in which it did _not_ occur. What net driver is that machine using? tg3? skge? The last version that it did not occour is 2.6.16-ck11 (i used to use ck sources.. i only tested vanilla just to be shore) drivers: - CONFIG_USB_USBNET=y CONFIG_USB_NET_CDCETHER=y CONFIG_USB=y CONFIG_USB_UHCI_HCD=y CONFIG_SK98LIN=y - is this what you are asking for? i will also submit .config Another thing could be important is that i cannot access to my modem status page (http://192.168.100.1). Created attachment 8413 [details]
kernel configuration file
I've noticed another thing. With 2.6.16-ck11 my network was no errors. eth1 Link encap:Ethernet HWaddr 00:11:80:EB:65:9A inet addr:217.129.133.137 Bcast:217.129.143.255 Mask:255.255.240.0 UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 RX packets:699616 errors:0 dropped:0 overruns:0 frame:0 TX packets:586397 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:666051647 (635.1 Mb) TX bytes:130582486 (124.5 Mb) while with 2.6.17.1 i noticed errors RX packets:699616 errors:!!WHERE!! dropped:0 overruns:0 frame:0 TX packets:586397 errors:(and maybe here?) dropped:0 overruns:0 I can re-test it if you want (for TX packets i am not shore) I am now using 2.6.17.3 and i noticed that I'm now capable to stay connected, but also having lots of eth1 errors. eth1 Link encap:Ethernet HWaddr 00:11:80:EB:65:9A inet addr:217.129.133.137 Bcast:217.129.143.255 Mask:255.255.240.0 UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 RX packets:615602 errors:4027 dropped:0 overruns:0 frame:4027 TX packets:674816 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:319488901 (304.6 Mb) TX bytes:241017064 (229.8 Mb) 4027 and raising. This bug is now 1/2 corrected. I don't know if I should downgrade to 2.6.16 you stay ussing 2.6.17 in this conditions. After all, it stills crashing my network. I don't know if the fact that now it took a little longer is to take in consideration. Also, should this be moved to Networking category instead Drivers category? (am i alone? lol) i had the same problem with 2.6.17.5, after a downgrade to 2.6.16.16 it disapeared. 2.6.16.26 seems to have the same problem too. still not corrected on 2.6.17.17.. please.. can i submit any more usefull information? eth1 Link encap:Ethernet HWaddr 00:11:80:EB:65:9A inet addr:217.129.133.137 Bcast:217.129.143.255 Mask:255.255.240.0 UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 RX packets:6498 errors:36 dropped:0 overruns:0 frame:36 TX packets:5639 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2071217 (1.9 Mb) TX bytes:2857804 (2.7 Mb) I again! I tested A LOT Of kernels versions and i found where the bug happen.. Kernel verion <= 2.6.16.23 is CLEAN (i think that 2.6.16.23 is the last one of 2.6.16 series) But right on kernel >= 2.6.17 I have problems with my modem. So, from 2.6.16.23 to 2.6.17 there is something new that cause this driver to fail. The changelog is very very extensive and I can't figure out wich commit cause this fail. Also, the 2.6.17.9 stills with this bug. I would like to know if i can do something more. I'm sorry, where i mean 2.6.16.23 is 2.6.16.27, on my last reply. So, after some help, i think i found the commint. --- commit e03d72b99e4027504ada134bf1804d6ea792b206 Author: Adrian Bunk <bunk@stusta.de> Date: Mon Jan 9 18:34:08 2006 -0800 [PATCH] drivers/net/sk98lin/: possible cleanups This patch contains the following possible cleanups: - make needlessly global functions static - remove unused code Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Stephen Hemminger <shemminger@osdl.org> Cc: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jeff Garzik <jgarzik@pobox.com> --- I don't see anything in this patch that could even remotely cause your problem. Except if there's a persistent bug in the driver that wasn't visible before due to pure luck. Since sk98lin is a driver that will not stay long-term in the kernel, can you check whether the skge driver works for you? I was just trying to guess wich commit made my modem start to bug.. yes, you're right.. sk98lin is for my network adapter, not for my modem.. I will look once more to changelog... First, to make sure we are not hunting an already fixed issue: - still present in 2.6.17.9? - still present in 2.6.18-rc4? If you want to find the commit that was causing your problem, you have to do git bisecting: # install git and cogito on your computer # clone the 2.6.17 tree: cg-clone \ git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.17.y.git # start bisecting: cd linux-2.6.17.y git bisect start git bisect bad git bisect good v2.6.16 # start round cp /patch/to/.config . make oldconfig make # install kernel, boot it, check whether it's good or bad, then: git bisect [bad|good] # start next round After at about 12 reboots, you'll have found the guilty commit ("... is first bad commit"). More information on git bisecting: man git-bisect 2.6.18-rc4 has the bug OFFTOPIC: and some stupid mouse bug (it jumps on the screen) 2.6.17.9 has also the bug after use git I think i found the bug: dccf4a48d47120a42382ba526f1a0848c13ba2a4 is first bad commit commit dccf4a48d47120a42382ba526f1a0848c13ba2a4 Author: Alan Stern <stern@rowland.harvard.edu> Date: Sat Dec 17 17:58:46 2005 -0500 [PATCH] UHCI: use one QH per endpoint, not per URB This patch (as623) changes the uhci-hcd driver to make it use one QH per device endpoint, instead of a QH per URB as it does now. Numerous areas of the code are affected by this. For example, the distinction between "queued" URBs and non-"queued" URBs no longer exists; all URBs belong to a queue and some just happen to be at the queue's head. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> :040000 040000 2210205f1df6153ddaa93cf25eb0300df45b5405 c287ccdd16dae4ef945ecc1e3e58a163ba27ebad M drivers -- I will now restart git for 2.6.17.y and try and option named "revert" just to be shore (i think it can help) Thanks for the bisecting! Reverting this patch in 2.6.17 seems to require reverting some other patches first. But it seems to work the other way: Can you create a non-working kernel by applying this patch agains 2.6.16.27? YEs.. i tried.. but.. # git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.16.y.git git cherry-pick dccf4a48d47120a42382ba526f1a0848c13ba2a4 fatal: Needed a single revision Not a single commit dccf4a48d47120a42382ba526f1a0848c13ba2a4 # The patch you identified (as623) did contain a bug, and a fix for it was added in 2.6.17.8 and 2.6.18-rc4. So your git-bisect result probably doesn't mean anything -- unless there is a second, unidentified bug still present. Can you provide the contents of /proc/bus/usb/devices (you may need to do "mount -t usbfs none /proc/bus/usb" first)? @luminoso: The bad commit is not in the 2.6.16 tree. Do a git-show dccf4a48d47120a42382ba526f1a0848c13ba2a4 > /tmp/patch-bad in the 2.6.17 git tree. Then go to the 2.6.16.27 sources (no matter whether they are from a git tree or directly downloaded) and do a patch -p1 < /tmp/patch-bad and compile this kernel. @Alan: Since both 2.6.17.9 and 2.6.18-rc4 don't work for him, it seems there is a second bug... Created attachment 8842 [details]
Bugfix for uhci-hcd in 2.6.17.8
In fact there must be two bugs: the known bug in the as623 patch plus whatever
else is still causing problems. The second bug might not be in uhci-hcd,
however.
In any case, it would be a mistake to apply as623 and predecessors without also
applying the bug-fix patch (attached).
cat /proc/bus/usb/devices T: Bus=05 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2 B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0 D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=0000 ProdID=0000 Rev= 2.06 S: Manufacturer=Linux 2.6.16 uhci_hcd S: Product=UHCI Host Controller S: SerialNumber=0000:00:1d.3 C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms T: Bus=04 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2 B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0 D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=0000 ProdID=0000 Rev= 2.06 S: Manufacturer=Linux 2.6.16 uhci_hcd S: Product=UHCI Host Controller S: SerialNumber=0000:00:1d.2 C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms T: Bus=03 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2 B: Alloc= 17/900 us ( 2%), #Int= 1, #Iso= 0 D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=0000 ProdID=0000 Rev= 2.06 S: Manufacturer=Linux 2.6.16 uhci_hcd S: Product=UHCI Host Controller S: SerialNumber=0000:00:1d.1 C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms T: Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 1.10 Cls=02(comm.) Sub=00 Prot=00 MxPS=32 #Cfgs= 1 P: Vendor=07b2 ProdID=5100 Rev= 1.01 S: Manufacturer=Broadcom Corporation S: Product=USB Cable Modem S: SerialNumber=001180EB659A C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=06 Prot=00 Driver=cdc_ether E: Ad=85(I) Atr=03(Int.) MxPS= 8 Ivl=64ms I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether E: Ad=81(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms T: Bus=03 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#= 3 Spd=1.5 MxCh= 0 D: Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1 P: Vendor=06da ProdID=0003 Rev= 0.00 S: Manufacturer=OMRON S: Product=87XXUPS C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid E: Ad=81(I) Atr=03(Int.) MxPS= 8 Ivl=20ms T: Bus=02 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=12 MxCh= 2 B: Alloc= 0/900 us ( 0%), #Int= 0, #Iso= 0 D: Ver= 1.10 Cls=09(hub ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=0000 ProdID=0000 Rev= 2.06 S: Manufacturer=Linux 2.6.16 uhci_hcd S: Product=UHCI Host Controller S: SerialNumber=0000:00:1d.0 C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=255ms T: Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#= 1 Spd=480 MxCh= 8 B: Alloc= 0/800 us ( 0%), #Int= 0, #Iso= 0 D: Ver= 2.00 Cls=09(hub ) Sub=00 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0000 ProdID=0000 Rev= 2.06 S: Manufacturer=Linux 2.6.16 ehci_hcd S: Product=EHCI Host Controller S: SerialNumber=0000:00:1d.7 C:* #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr= 0mA I: If#= 0 Alt= 0 #EPs= 1 Cls=09(hub ) Sub=00 Prot=00 Driver=hub E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=256ms Thanks for all help! I will now work on patches... I will reply soon I patched 2.6.27 sources with commit dccf4a48d47120a42382ba526f1a0848c13ba2a4 and it was... clean! It should be buggy, right? Then i patched again the same sources with patch "Bugfix for uhci-hcd in 2.6.17.8" and it still clean.. Should I bisect again? I still have shore that this bug IS present in 2.6.17.9 and 2.6.18-rc4, also, I am shore, that it happears from 2.6.16.27 to 2.6.17. This is why I don't trust git-bisect. Just as an experiment, you could take all the drivers/usb/host/uhci* files from the 2.6.17.9 source tree and copy them into the corresponding 2.6.16.23 directory. I think it should build okay. Then test the resulting 2.6.16.23 kernel with the transplanted uhci-hcd driver. If it works, you'll know the bug is in some other driver. @Alan: I followed your sugestion but it doesn't work: CC drivers/usb/host/uhci-hcd.o drivers/usb/host/uhci-hcd.c:53:24: error: pci-quirks.h: No such file or directory drivers/usb/host/uhci-hcd.c: In function Whoops... Looks like you also need to copy drivers/usb/host/pci-quirks.h from 2.6.17.9 into the 2.6.16.23 directory. I forgot about that one. @Alan: I did what you said and... yes.. the bug is where! =) Now.. what can I do? Will I have this bug corrected? Okay, good. You have definitely proved there is a bug in the 2.6.17.9 uhci-hcd driver. Before it can be fixed, though, we need to find out exactly where it is. The next step is to use the usbmon facility in 2.6.17.9. Instructions are in the kernel source file Documentation/usb/usbmon.txt. You will need to turn on CONFIG_USB_MON and CONFIG_DEBUG_FS, and you should also set CONFIG_USB_DEBUG. Capture the usbmon data during a test run, and make sure the bug occurs. Then attach it here, along the dmesg log for the same period. Created attachment 8859 [details] dmesg for bug 6747 from 2.6.17.9 kernel Created attachment 8860 [details]
from /sys/kernel/debug/usbmon/3t
from kernel 2.6.17.9
When start using usbmon: eth1 Link encap:Ethernet HWaddr 00:11:80:EB:65:9A inet addr:217.129.133.137 Bcast:217.129.143.255 Mask:255.255.240.0 UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 RX packets:564 errors:3 dropped:0 overruns:0 frame:3 TX packets:336 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:233158 (227.6 Kb) TX bytes:37243 (36.3 Kb) At the end: eth1 Link encap:Ethernet HWaddr 00:11:80:EB:65:9A inet addr:217.129.133.137 Bcast:217.129.143.255 Mask:255.255.240.0 UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1908 errors:12 dropped:0 overruns:0 frame:12 TX packets:1565 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1588133 (1.5 Mb) TX bytes:166834 (162.9 Kb) #cat /proc/bus/usb/devices T: Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=12 MxCh= 0 D: Ver= 1.10 Cls=02(comm.) Sub=00 Prot=00 MxPS=32 #Cfgs= 1 P: Vendor=07b2 ProdID=5100 Rev= 1.01 S: Manufacturer=Broadcom Corporation S: Product=USB Cable Modem S: SerialNumber=001180EB659A C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=06 Prot=00 Driver=cdc_ether E: Ad=85(I) Atr=03(Int.) MxPS= 8 Ivl=64ms I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether I: If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether E: Ad=81(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms If I can do anything more just ask! I don't see any indication of problems in the usbmon log, and none in the dmesg log except possibly the line saying "eth1: rxqlen 0 --> 4". Are you sure that the problem occurred and the interface froze while you were collecting the logs? If it did, try again and after the freeze occurs do this: echo 2 >/sys/module/uhci_hcd/parameters/debug and then attach a copy of /sys/kernel/debug/uhci/0000:00:1d.1. By the way, only the last hundred lines or so of the usbmon log are important. You can chop off the rest. Hi.. I performed what you said with 2.6.17.10 kernel. But with this kernel eth1 doesn't freeze. But, still making errors (saw in infconfig) usbmon here: ftp://ftp.ua.pt/incoming/luminoso/2.mon.out.tar.bz2 do you need more logging? You're talking about two different problems: the framing errors and the interface freezing. Freezing is almost certainly caused by a bug in uhci-hcd, and that's what I want to fix. Now go back over your earlier tests. With which versions of the driver does the interface freeze? Did it freeze when you transplanted the 2.6.17 driver back to 2.6.16? Does it freeze under 2.6.18-rc4? The framing errors you see in ifconfig are a completely different matter. They have nothing to do with uhci-hcd. If you didn't see any of these errors in 2.6.16, it may be a result of changes in the USB network driver. (Unless you can prove that the framing errors never occur in vanilla 2.6.16 but they do occur when you transplant the 2.6.17 version of uhci-hcd to 2.6.16. If that happens then I am wrong.) Adrian Bunk or someone else can tell you more about the framing errors than I can. For the time being, let's just work on the freezing issue. Point of situation: -On 2.6.17.9 eth1 still freeze, but not so frequently. -Also, when I copy uhci drivers from 2.6.17.* to 2.6.16.27 ifconfig starts to make errors, but i didn't test if eth1 died. -Ifconfig errors was always present with freezing issuse. It's a quite frustrating.. it's really random.. When I started this bug it happens in 5 minutes after computer start.. now is taking more than one day long. (it freezes yesterday.. but i decided:let's use a clean 2.6.17.9! It's was giving framing erros but isn't freezing.. so why not use 2.6.17.{9,10}? I deleted all sources and unpacked clean tarball again.. when i was not debugging it happend.. now I am again debugging and waiting with 2.6.17.10) I am now again waiting.. I hope report a new usefull usbmon debug soon. Don't forget to report the results from the test in comment #31 also! Created attachment 8881 [details]
Try to detect changes to rx_frame_errors
Here's a patch for 2.6.17.10 which I hope will tell us exactly when the RX
frame-error count changes. If it works, it will add some very disruptive
messages to the system log every time the value is increased.
The patch will create a new configuration option, CONFIG_KWATCH. You will have
to turn that on and then rebuild the kernel. Unfortunately I can't test the
patch because I don't have a USB ethernet interface.
Created attachment 8886 [details]
logs generated from patch
Hi..
I applied the patch sucefully on 2.6.17.10 sources but I am getting problems on
torning on 'CONFIG_KWATCH=y' on .config. After editing it and performing "make
clean && make" it simply disapear from .config. I think it isn't enabled.
Anyway, I make a tarball of /var/log. Maybe logs are there and I just didn't
see.
This logs are clean. I deleted all of them before restart.
Offtopic: Even when using clean tarballs my kernel is named
2.6.17.10-g9e88eb4d-dirty. How do i get rid of "-g9e88eb4d-dirty"?
The option depends on CONFIG_DEBUG_KERNEL - you have to also set this option to y. Yes, I forgot to mention CONFIG_DEBUG_KERNEL because I _always_ keep that option turned on for my work! I don't know where the "-g9e88eb4d-dirty" comes from. Check the topmost Makefile to make sure it's not there. It might have something to do with the way you install the new kernel for booting. What do you get from "uname -a"? Created attachment 8895 [details] 2nd log generated from patch Hi.. There it is new logs generated with patch, now with CONFIG_KWATCH=y working. I also included .config of current kernel. I checked if 'ifconfig' is showing errors and it look like this before making the tarball: eth1 Link encap:Ethernet HWaddr 00:11:80:EB:65:9A inet addr:217.129.133.137 Bcast:217.129.143.255 Mask:255.255.240.0 UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1 RX packets:218676 errors:1225 dropped:0 overruns:0 frame:1225 TX packets:242498 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:78791751 (75.1 Mb) TX bytes:89479444 (85.3 Mb) About the "dirty" it started to appear since Comment #18. From that patch on all new clean tarballs have "dirty" on kernel name. uname -a Linux khona4 2.6.17.10-g9e88eb4d-dirty #6 SMP PREEMPT Tue Aug 29 13:00:40 WEST 2006 i686 Intel(R) Pentium(R) 4 CPU 2.80GHz GenuineIntel GNU/Linux Created attachment 8896 [details]
usbnet rx-frame-errors test patch
I can't tell what's going on. If that patch was working correctly, we should
have seen any change to the rx frame error counter.
Try this patch instead of 8881. Attaching kern.log alone will be sufficient.
The "dirty" thing might be a leftover from your attempt at using git-bisect.
maybe something didn't get cleaned up all the way.
Created attachment 8899 [details]
kernel.log output
Here it is.. I think now it worked. =)
It did work. At least, the error counter remained at 0. But I didn't change anything! All the patch does is print out the value of the counter whenever a packet is sent or received. I don't know. Try running with and without the patch (all you have to do is keep two separate copies of usbnet.ko, one built with the patch and one without, and insmod one or the other to switch between them). See if you can get the error counter to change from 0. Created attachment 8908 [details]
Log usbnet errors
I just got a clue, and now I understand a lot better what's going on. Forget
about those previous patches and use this one instead. It will put a message
in the kernel log every time one of those network errors occurs. This way
we'll be able to see exactly what sort of error it is.
Created attachment 8910 [details]
output of patch8908
Huuuuugeeeee thanks! I spend all day patching and restarting, trying to find
anything new but nothing.
Where it is the results (thanks) of the 8908patch. I will keep it running just
for in case you need more.
Created attachment 8912 [details]
Print rx header
You don't have to keep running that patch; what you posted is good enough.
It's now clear where the errors are coming from. The driver is receiving
packets that are shorter than 14 bytes (the minimum for Ethernet). I have no
idea why this happens only under 2.6.17 and not under 2.6.16.
Here's a different patch to try. This one will log the headers in first 8
packets received and then after that the first 8 error packets. I hope that
seeing the header contents will give a clue as to what's wrong.
Created attachment 8913 [details]
results of patch8912
here it is... it slowed down after 45mins +/-
Created attachment 8920 [details]
Bugfix for uhci-hcd: skip to end of TD list during toggle fixup
I think we found it. This patch fixes a definite bug in uhci-hcd, and it's
almost certainly the bug causing your problems. Try it and see, but keep the
previous patch installed as well just in case.
Created attachment 8921 [details]
dmesg output
IT WORKS!!!!!!!!!!! :)
dmegs still prints attached messages, but only at startup.
Maybe it's revelant: yesterday, when testing patch8912 even it stoped, like
today, showing new information on kernel.log errors still keep raising.
But now no errors on ifconfig. Works perfect! :)
Patch 8912 was designed to stop after printing 16 lines (or after 8 lines if there are no errors). I didn't need any more debugging info than that, and so of course you saw the errors continuing to increase even after the patch had stopped printing. You can drop 8912 now. It isn't needed, because 8920 fixes the problem. I'll submit 8920 for inclusion in the 2.6.17.x stable series and also for 2.6.18. If you want, you can apply 8920 to 2.6.18-rc4 (or -rc5) right now. It should apply to that kernel with no changes and it should fix exactly the same problem. Since everything is now working, I'll close this bug report. Ok, thanks for all!!! (should i also report mouse-jumping-bug that i found (i'm a sad sad man lol) on 2.6.18-rc4?) If it's still present in 2.6.18-rc5, go ahead and report it. But start a separate Bugzilla entry. Here it is: bug 7008 Once again, thanks for everything. |