Bug 4836
Summary: | enabling CONFIG_LLC=y crashes on llc_station_ac_send_test_r | ||
---|---|---|---|
Product: | Networking | Reporter: | Michael Goldman (haizaar) |
Component: | Other | Assignee: | Arnaldo Carvalho de Melo (acme) |
Status: | DEFERRED WILL_FIX_LATER | ||
Severity: | blocking | CC: | akpm |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.12 2..6.11.11 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
crash printout
.config file that was used to compile the kernel linkloop package tarball, version 0.0.1 patch for linkloop-0.0.1 .config file for kernel 2.6.7 bumps kernel clock to 2khz Ajusts polling time of rocket port driver Makes rocker port driver to 'export' pci-ids Screenshot of the vanilla 2.6.12 crash on HP XW6200 |
Description
Michael Goldman
2005-07-03 15:40:08 UTC
Created attachment 5253 [details]
crash printout
This is the output printed on console when kernel crashes. Since the system was
already dead, I've had to write it down manually.
Created attachment 5254 [details]
.config file that was used to compile the kernel
this is .config for 2.6.12 + genpatches-2.6.12-5.base +
genpatches-2.6.12-5.extras (except of speakup)
Created attachment 5255 [details]
linkloop package tarball, version 0.0.1
this is the linkloop package what enables 'pinging' computers by MAC if they
run linkloop_reply daemon.
NOTE: With CONFIG_LLC=y, kernel crashed even when it was not running
linkloop_reply daemon
Created attachment 5256 [details]
patch for linkloop-0.0.1
The patch fixes some small bugs, performs more precise checks, enables
broadcast and utilizes new PF_PACKET interface. This pactch was applied on
tested platform
Another note: crash 'screenshot' was made on Proliant DL580 machine, but crash outout on XW6200 machines was very similar. I can provide all additional information (lspci, etc...) if one request it. It's not clear - could you please describe which patches, if any, were applied to the vanilla 2.6.12 kernel? Are you able to identify earlier 2.6-based kernels which worked OK? If so, which? 2.6.7, I assume. Have you tried anything later? Also, there should have been some extra debug output from this: printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p " "data:%p tail:%p end:%p dev:%s\n", here, skb->len, sz, skb->head, skb->data, skb->tail, skb->end, skb->dev ? skb->dev->name : "<NULL>"); Can you obtain that? Ok. Here is more info: I've checked the situation with 2.6.7 kernel: 1. With CONFIG_LLC=n no problem occurs 2. I've set CONFIG_LLC=m and CONFIG_LLC2=m. Recompiled, and now I have two modules: llc and llc2 3. Rebooting the system (2.6.7) modules are NOT loaded by default. 4. a. linkloop <MAC> - all ok b. modprope llc - dmesg says nothing new c. linkloop <MAC> - still all ok d. modprobe llc2 - dmesg says: 'NET: Registered protocol family 26' e. linkloop <MAC> - immediate crash Note: on 2.6.11/12 llc was compiled into kernel, and not as modules I have to mention that this 2.6.7 based system is working in 'production' for about a half year, and proved to be stable. I've discovered the problem when I've tried to upgrade to newer kernel, and payed attention to that LLC option in menuconfig. Basicly I do not use it, but as we see now, 2.6.7 does have this bug as well. Created attachment 5258 [details]
.config file for kernel 2.6.7
Note, that 2.6.11/12 kernels was highly modularized, as opposed to 2.6.7
Created attachment 5259 [details]
bumps kernel clock to 2khz
note, that this patch was applied to 2.6.7 kernel only , and not to 2.6.11/12
Created attachment 5260 [details]
Ajusts polling time of rocket port driver
This patch ajusts polling time for Comtrol Rocket Port driver. It was applied
to all kernels
Created attachment 5261 [details]
Makes rocker port driver to 'export' pci-ids
This patch make driver to export pci-ids of devices supports. This helps to
autodetect Rocket Port cards. This patch was applied to all kernels.
In addtition to patches attachaed, 2.6.12 (and only it) was patched with latest Gentoo patchset (genpatches-2.6.12-5 - http://dev.gentoo.org/~dsd/genpatches/patches-2.6.12-5.htm) Andrew, can please explain what exactly do you want me to with this?: printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p " "data:%p tail:%p end:%p dev:%s\n", here, skb->len, sz, skb->head, skb->data, skb->tail, skb->end, skb->dev ? skb->dev->name : "<NULL>"); You'll have trouble getting any of the net guys to attend to this with so many kernel patches applied. Please reproduce the bug on a vanilla kernel.org kernel. wrt this: printk(KERN_EMERG "skb_over_panic: ... the output from that printk should have appeared on your terminal or in the logs. It might help the net guys fix the bug. Ok. I've reproduced the bug on VANILLA kernels as well - both 2.6.7 and 2.6.12. Crash occurs under the same conditions in exactly same way. Andrew, printk output you've asked for: sbk_over_panic: text:f93e81ae len:1500 put:1497 head:f517de00 data:f517de2f tail:f517e40b end:f517de80 dev:eth0 Created attachment 5268 [details]
Screenshot of the vanilla 2.6.12 crash on HP XW6200
sbk_over_panic: text:f93e81ae len:1500 put:1497 head:f517de00 data:f517de2f tail:f517e40b end:f517de80 dev:eth0 I'm not familiar with the code or the protocol, but it seems the incoming frame had a size of 1500 bytes and it is tried to copy it to a newly allocated packet (llc_station_ac_send_test_r -> llc_pdu_init_as_test_rsp), but the newly allocated packet only has 128-50 bytes available (llc_alloc_frame uses a fixed size). So the fix would be to either allocate a larger frame or copy less data. Arnaldo? Sorry guys, just not enough bandwidth right now, I'm working on getting a better infrastructure in place and then I'll go over LLC making it use this infrastructure, heck, I have lots of patches at: http://www.kernel.org/pub/linux/kernel/people/acme/v2.6/llc-2.6.1/ I really plan to go over those and merge with new stuff I'm working on, but for now I using the PF_LLC BSD sockets interface is very experimental, needs a thorough audit. The minimal code needed for Appletalk, IPX, token ring, etc should be safe tho. |