Distribution: LFS (Linux From scratch) based Hardware Environment: HP Proliant DL580, HP XW6200 Software Environment: 2.6.11.12-vanilla, 2.6.12-vanilla+genpatches-2.6.12-r5 Problem Description: Enabling CONFIG_LLC=y causes kernel to crash on llc_station_ac_send_test_r On the other hand, when I've compiled 2.6.12 with CONFIG_LLC=n, everything worked fine - kernel did not crashed, no matter linkloop_reply daemon was running or not. The details: I'm using custom distribution based on LFS 6.0 (2.6.7, gcc-3.4.1, glibc-2.3.4-cvs). I've upgraded to newer kernel and enabled CONFIG_LLC=y (and CONFIG_LLC2=y). All of my machines (both Proliant and XW) have Broadcom NICs. I use tg3 driver for them. If one uses linkloop package (from sourceforge) to 'ping' your upgraded machine by mac address using linkloop, kernel crashes. In the beginnig I've thought its the problem with tg3 driver - that was the reason I've tried 2.6.12 - there were many changes to tg3 driver (according to ChangeLog). Applying Gentoo latest patchset to 2.6.12 did not help as well. Steps to reproduce: 1. Compile LFS 6.0 2. Upgrade to 2.6.11.11 (or 12). 3. Download linkloop package from sourcforge. 4. Ping the machine from network by MAC address using linkloop tool 5. See the immediate crash
Created attachment 5253 [details] crash printout This is the output printed on console when kernel crashes. Since the system was already dead, I've had to write it down manually.
Created attachment 5254 [details] .config file that was used to compile the kernel this is .config for 2.6.12 + genpatches-2.6.12-5.base + genpatches-2.6.12-5.extras (except of speakup)
Created attachment 5255 [details] linkloop package tarball, version 0.0.1 this is the linkloop package what enables 'pinging' computers by MAC if they run linkloop_reply daemon. NOTE: With CONFIG_LLC=y, kernel crashed even when it was not running linkloop_reply daemon
Created attachment 5256 [details] patch for linkloop-0.0.1 The patch fixes some small bugs, performs more precise checks, enables broadcast and utilizes new PF_PACKET interface. This pactch was applied on tested platform
Another note: crash 'screenshot' was made on Proliant DL580 machine, but crash outout on XW6200 machines was very similar. I can provide all additional information (lspci, etc...) if one request it.
It's not clear - could you please describe which patches, if any, were applied to the vanilla 2.6.12 kernel? Are you able to identify earlier 2.6-based kernels which worked OK? If so, which? 2.6.7, I assume. Have you tried anything later?
Also, there should have been some extra debug output from this: printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p " "data:%p tail:%p end:%p dev:%s\n", here, skb->len, sz, skb->head, skb->data, skb->tail, skb->end, skb->dev ? skb->dev->name : "<NULL>"); Can you obtain that?
Ok. Here is more info: I've checked the situation with 2.6.7 kernel: 1. With CONFIG_LLC=n no problem occurs 2. I've set CONFIG_LLC=m and CONFIG_LLC2=m. Recompiled, and now I have two modules: llc and llc2 3. Rebooting the system (2.6.7) modules are NOT loaded by default. 4. a. linkloop <MAC> - all ok b. modprope llc - dmesg says nothing new c. linkloop <MAC> - still all ok d. modprobe llc2 - dmesg says: 'NET: Registered protocol family 26' e. linkloop <MAC> - immediate crash Note: on 2.6.11/12 llc was compiled into kernel, and not as modules I have to mention that this 2.6.7 based system is working in 'production' for about a half year, and proved to be stable. I've discovered the problem when I've tried to upgrade to newer kernel, and payed attention to that LLC option in menuconfig. Basicly I do not use it, but as we see now, 2.6.7 does have this bug as well.
Created attachment 5258 [details] .config file for kernel 2.6.7 Note, that 2.6.11/12 kernels was highly modularized, as opposed to 2.6.7
Created attachment 5259 [details] bumps kernel clock to 2khz note, that this patch was applied to 2.6.7 kernel only , and not to 2.6.11/12
Created attachment 5260 [details] Ajusts polling time of rocket port driver This patch ajusts polling time for Comtrol Rocket Port driver. It was applied to all kernels
Created attachment 5261 [details] Makes rocker port driver to 'export' pci-ids This patch make driver to export pci-ids of devices supports. This helps to autodetect Rocket Port cards. This patch was applied to all kernels.
In addtition to patches attachaed, 2.6.12 (and only it) was patched with latest Gentoo patchset (genpatches-2.6.12-5 - http://dev.gentoo.org/~dsd/genpatches/patches-2.6.12-5.htm)
Andrew, can please explain what exactly do you want me to with this?: printk(KERN_EMERG "skb_over_panic: text:%p len:%d put:%d head:%p " "data:%p tail:%p end:%p dev:%s\n", here, skb->len, sz, skb->head, skb->data, skb->tail, skb->end, skb->dev ? skb->dev->name : "<NULL>");
You'll have trouble getting any of the net guys to attend to this with so many kernel patches applied. Please reproduce the bug on a vanilla kernel.org kernel. wrt this: printk(KERN_EMERG "skb_over_panic: ... the output from that printk should have appeared on your terminal or in the logs. It might help the net guys fix the bug.
Ok. I've reproduced the bug on VANILLA kernels as well - both 2.6.7 and 2.6.12. Crash occurs under the same conditions in exactly same way. Andrew, printk output you've asked for: sbk_over_panic: text:f93e81ae len:1500 put:1497 head:f517de00 data:f517de2f tail:f517e40b end:f517de80 dev:eth0
Created attachment 5268 [details] Screenshot of the vanilla 2.6.12 crash on HP XW6200
sbk_over_panic: text:f93e81ae len:1500 put:1497 head:f517de00 data:f517de2f tail:f517e40b end:f517de80 dev:eth0 I'm not familiar with the code or the protocol, but it seems the incoming frame had a size of 1500 bytes and it is tried to copy it to a newly allocated packet (llc_station_ac_send_test_r -> llc_pdu_init_as_test_rsp), but the newly allocated packet only has 128-50 bytes available (llc_alloc_frame uses a fixed size). So the fix would be to either allocate a larger frame or copy less data. Arnaldo?
Sorry guys, just not enough bandwidth right now, I'm working on getting a better infrastructure in place and then I'll go over LLC making it use this infrastructure, heck, I have lots of patches at: http://www.kernel.org/pub/linux/kernel/people/acme/v2.6/llc-2.6.1/ I really plan to go over those and merge with new stuff I'm working on, but for now I using the PF_LLC BSD sockets interface is very experimental, needs a thorough audit. The minimal code needed for Appletalk, IPX, token ring, etc should be safe tho.