Most recent kernel where this bug did not occur: 2.6.17 Distribution: Debian Hardware Environment: Dell XPS M1210 Intel Dual Core - 2 Ghz RAM - 1 GB Software Environment: Debian testing/unstable on 2.6.18 Problem Description: I'm terming it a problem because the same cisco vpn client works perfect when the interface being used is not ieee1394. If I use my LOM (tg3), it works perfect. But still it is a binary module, so if you feel it shouldn't be here, I don't mind you closing it. :-) Steps to reproduce: * Install 2.6.18 * Default interface is ieee1394 * Install cisco vpnclient (Cisco Systems VPN Client Version 4.8.00 (0490) kernel module loaded) * Try connecting using the cisco vpnclient Following is the oops you get: geeKISSexy:/var/log# cat /tmp/cisco_oops Oct 7 10:56:04 geeKISSexy kernel: Cisco Systems VPN Client Version 4.8.00 (0490) kernel module loaded Oct 7 10:56:36 geeKISSexy kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000041 Oct 7 10:56:36 geeKISSexy kernel: printing eip: Oct 7 10:56:36 geeKISSexy kernel: f9121109 Oct 7 10:56:36 geeKISSexy kernel: *pde = 00000000 Oct 7 10:56:36 geeKISSexy kernel: Oops: 0000 [#1] Oct 7 10:56:36 geeKISSexy kernel: PREEMPT SMP Oct 7 10:56:36 geeKISSexy kernel: Modules linked in: cisco_ipsec appletalk ax25 ipx p8023 nvidia agpgart ipv6 binfmt_misc cpufreq_ondemand cpufreq_userspace cpufreq_powersave speedstep_centrino freq_table rfcomm l2cap bluetooth button ac battery ipw3945 ieee80211 ieee80211_crypt firmware_class dm_snapshot dm_mirror sbp2 joydev mousedev tsdev snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer psmouse sg snd serio_raw i2c_i801 soundcore eth1394 sdhci mmc_core evdev i2c_core snd_page_alloc sr_mod rtc cdrom uhci_hcd sd_mod ohci1394 ehci_hcd b44 mii usbcore ieee1394 thermal processor fan Oct 7 10:56:36 geeKISSexy kernel: CPU: 0 Oct 7 10:56:36 geeKISSexy kernel: EIP: 0060:[<f9121109>] Tainted: P VLI Oct 7 10:56:36 geeKISSexy kernel: EFLAGS: 00010202 (2.6.17-my-patches-xps1 #2) Oct 7 10:56:36 geeKISSexy kernel: EIP is at CniGetBindingByIndex+0xf/0x21 [cisco_ipsec] Oct 7 10:56:36 geeKISSexy kernel: eax: f91721f0 ebx: 80002078 ecx: 00000003 edx: 00000001 Oct 7 10:56:36 geeKISSexy kernel: esi: f9172208 edi: c6ad3ee4 ebp: c6ad3d48 esp: c6ad3d2c Oct 7 10:56:36 geeKISSexy kernel: ds: 007b es: 007b ss: 0068 Oct 7 10:56:36 geeKISSexy kernel: Process cvpnd (pid: 14019, threadinfo=c6ad2000 task=dff47550) Oct 7 10:56:36 geeKISSexy kernel: Stack: f9124cf1 00000003 00490024 c02d3337 c01ab25d 000000d8 00000002 c6ad3ec8 Oct 7 10:56:36 geeKISSexy kernel: f9129250 00000000 00000003 f9172210 f917221a 2938fea9 257003ca 00000246 Oct 7 10:56:36 geeKISSexy kernel: 00000046 dfbc1640 000000d0 00000000 00000000 d0018810 c17ebe00 c0187ef5 Oct 7 10:56:36 geeKISSexy kernel: Call Trace: Oct 7 10:56:36 geeKISSexy kernel: <f9124cf1> ConfigurePublicInterface+0x11/0x70 [cisco_ipsec] <c02d3337> _spin_unlock+0xd/0x21 Oct 7 10:56:36 geeKISSexy kernel: <c01ab25d> find_revoke_record+0x73/0x7c <f9129250> CniPluginIOCTL+0x450/0x640 [cisco_ipsec] Oct 7 10:56:36 geeKISSexy kernel: <c0187ef5> proc_alloc_inode+0x3e/0x63 <c02d3215> _spin_lock+0xd/0x5a Oct 7 10:56:36 geeKISSexy kernel: <c02d3337> _spin_unlock+0xd/0x21 <c018aec1> proc_lookup+0xa0/0xbf Oct 7 10:56:36 geeKISSexy kernel: <c01683b5> do_lookup+0x4f/0x135 <c0170df6> dput+0x1a/0x11b Oct 7 10:56:36 geeKISSexy kernel: <c0146b21> __mod_page_state_offset+0x11/0x1f <c014738e> get_page_from_freelist+0x1d1/0x35b Oct 7 10:56:36 geeKISSexy kernel: <c01213e7> local_bh_enable+0x68/0x7e <c027ea6d> neigh_lookup+0xed/0xf7 Oct 7 10:56:36 geeKISSexy kernel: <c02b1e9b> arp_ioctl+0x56c/0x5a1 <c0146b21> __mod_page_state_offset+0x11/0x1f Oct 7 10:56:36 geeKISSexy kernel: <c02d3337> _spin_unlock+0xd/0x21 <c014e04b> __handle_mm_fault+0x6df/0x707 Oct 7 10:56:36 geeKISSexy kernel: <f9121a4b> interceptor_ioctl+0x0/0x2bd [cisco_ipsec] <f9121ab4> interceptor_ioctl+0x69/0x2bd [cisco_ipsec] Oct 7 10:56:36 geeKISSexy kernel: <c027afbb> dev_ifsioc+0x362/0x37c <c0271923> sock_ioctl+0x0/0x1c2 Oct 7 10:56:36 geeKISSexy kernel: <c027b55b> dev_ioctl+0x3da/0x46b <c02d3337> _spin_unlock+0xd/0x21 Oct 7 10:56:36 geeKISSexy kernel: <c01703e0> d_rehash+0x5c/0x69 <c02d3337> _spin_unlock+0xd/0x21 Oct 7 10:56:36 geeKISSexy kernel: <c027231b> sock_attach_fd+0x6c/0xcc <c015b31c> fd_install+0x24/0x50 Oct 7 10:56:36 geeKISSexy kernel: <c0271923> sock_ioctl+0x0/0x1c2 <c016c294> do_ioctl+0x1c/0x5d Oct 7 10:56:36 geeKISSexy kernel: <c016c51f> vfs_ioctl+0x24a/0x25c <c0273027> sys_socketcall+0x51/0x181 Oct 7 10:56:36 geeKISSexy kernel: <c016c579> sys_ioctl+0x48/0x5f <c0102cb3> sysenter_past_esp+0x54/0x75 Oct 7 10:56:36 geeKISSexy kernel: Code: 42 83 c0 2c 3d f0 21 17 f9 75 ed b8 00 00 51 24 89 13 eb 05 b8 06 00 51 e4 5b 5e c3 8b 4c 24 04 b8 80 1e 17 f9 8b 10 85 d2 74 05 <3b> 4a 40 74 0c 83 c0 2c 3d 1c 22 17 f9 75 eb 31 c0 c3 55 31 c9 Oct 7 10:56:36 geeKISSexy kernel: EIP: [<f9121109>] CniGetBindingByIndex+0xf/0x21 [cisco_ipsec] SS:ESP 0068:c6ad3d2c
Bugs with binary-only modules loaded (you are using at least two of them) are not debuggable. Please ask the vendors of these modules for support.
If this didn't happen in 2.6.17, it would be helpful if you could check for possible culprits among the ieee1394 driver updates from 2.6.17 to 2.6.18: http://me.in-berlin.de/~s5r6/linux1394/merged/in_2.6.18/ I could bring this patch collection into proper order so that you can biject them (e.g. with quilt). Should I prepare this on top of plain 2.6.17 or on any 2.6.17.x? There were also ieee1394 patches in 2.6.17.2, .8, .11. Bijecting on top of plain 2.6.17 would check these -stable patches too. We could also stack the reverse of the patches on top of 2.6.18, or at least almost all of them. There was only a single patch to eth1394 which you could test first.
In reply to comment #1: If a binary only module is not acceptable and you won't fix it, why don't you simply deny load of such modules. If you want to stop binary-only modules, don't have any such framework for it. First you show the path, and then you mandate to walk your way. Why not first teach them to work your way and then give them access to the path.
The module support was not made for binary-only modules. Whether binary-only modules are legal at all is a disputed question only lawyers can decide. But the point why bugs with binary-only modules loaded are innvalid here is that a module can do ANYTHING, and we've already had too many seemingly unrelated bugs that turned out to be caused by binary-only modules, and that were undebuggable since we don't have the source for the binary-only modules.
Adrian said they are _not debuggable_ rather than _not acceptable_. You can load them, and you can try to debug them on your own or with the help of the authors of this module. Before you do so you could check for a potential regression of eth1394 like I suggested. Please say so if you like to get the 2.6.17-to-2.6.18 FireWire patch series rearranged for this purpose. But you could also use git to do so, using a clone of Linus' tree and bijecting between the known good and bad snapshots. If you find the point were it broke, we can try to get a clue if the issue is with Linux or with the VPN client. But the findings could also turn out inconclusive, which would shift the burden to Cisco.
PS: I absolutely agree to keep this bug 'REJECTED INVALID', unless rrs is able to dig out an actual kernel bug.
Hi, I was just looking into the installer package of the cisco vpnclient. You mentioned in comment #1 seeing 2 binary-only modules loaded. One is nvidia. Can you tell the other one please. I hope that my understanding that ipw3945 is not a binary-only module is correct. If yes, then the cisco_ipsec module shouldn't also qualify as a binary only module. The source code to build the kernel module is provided in the tar.gz file. It, same as ipw3945, copies its binary daemon (cvpnd), libraries and init scripts. The files that build the cisco_ipsec module are provided with the package. Would be great if you could have a look to see if it really is a candidate for a binary-only module.
Unless module authors play tricks, 'dmesg | grep taint' should show which modules tainted the kernel, AFAIK. BTW the term 'binary only' is a bit misleading: nVidia's driver for example come in several parts; two of them run in the kernel's address space: A thin open source interface layer and the actual kernel driver which is closed source. AFAIK cisco_ipsec, i.e. the component that is loaded into the kernel's address space, is partly closed source too. Or did they release it as open source now?
One binary-only module (nvidia) makes it undebuggable. At least three external modules are loaded, and even one of them alone might will make it off-topic here. Even further, you seem to simply ignore Stefan's request to check whether any of the ieee1394 updates in 2.6.18 caused your breakage. There are rules what's offtopic here and what is ontopic. It's simply required to set limits since the (often spare) time people are spending on debugging kernel bugs here is not unlimited. There are people offering technical consultancy that might spend as much time as you want to pay them for on helping you debugging your problems.
On request, I could have reproduced the bug without the nvidia module. Yes, I've been avoiding Stefan's request because: a) I'm not a Kernel Developer b) Nor a Q.A. c) I'm an end-user using Linux on my laptop busy with my own deadlines I filed the bug because I found it. And I keep the tendency to report such bugs so that people remain aware. The argument of whether it is a valid bug or not, whether binary-only modules should be allowed or not, is not my domain. Hence, closing it.