Bug 7295 - kernel oops when using cisco vpnclient
Summary: kernel oops when using cisco vpnclient
Status: CLOSED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: drivers_ieee1394
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-09 22:53 UTC by Ritesh Raj Sarraf
Modified: 2006-11-05 10:12 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.18
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Ritesh Raj Sarraf 2006-10-09 22:53:26 UTC
Most recent kernel where this bug did not occur: 2.6.17

Distribution: Debian

Hardware Environment: 
Dell XPS M1210
Intel Dual Core - 2 Ghz
RAM - 1 GB

Software Environment: Debian testing/unstable on 2.6.18

Problem Description:
I'm terming it a problem because the same cisco vpn client works perfect when 
the interface being used is not ieee1394.
If I use my LOM (tg3), it works perfect. But still it is a binary module, so if 
you feel it shouldn't be here, I don't mind you closing it. :-)

Steps to reproduce:
* Install 2.6.18
* Default interface is ieee1394
* Install cisco vpnclient (Cisco Systems VPN Client Version 4.8.00 (0490) 
kernel module loaded)
* Try connecting using the cisco vpnclient

Following is the oops you get:
geeKISSexy:/var/log# cat /tmp/cisco_oops
Oct  7 10:56:04 geeKISSexy kernel: Cisco Systems VPN Client Version 4.8.00 
(0490) kernel module loaded
Oct  7 10:56:36 geeKISSexy kernel: BUG: unable to handle kernel NULL pointer 
dereference at virtual address 00000041
Oct  7 10:56:36 geeKISSexy kernel:  printing eip:
Oct  7 10:56:36 geeKISSexy kernel: f9121109
Oct  7 10:56:36 geeKISSexy kernel: *pde = 00000000
Oct  7 10:56:36 geeKISSexy kernel: Oops: 0000 [#1]
Oct  7 10:56:36 geeKISSexy kernel: PREEMPT SMP
Oct  7 10:56:36 geeKISSexy kernel: Modules linked in: cisco_ipsec appletalk 
ax25 ipx p8023 nvidia agpgart ipv6 binfmt_misc cpufreq_ondemand 
cpufreq_userspace cpufreq_powersave speedstep_centrino freq_table rfcomm l2cap 
bluetooth button ac battery ipw3945 ieee80211 ieee80211_crypt firmware_class 
dm_snapshot dm_mirror sbp2 joydev mousedev tsdev snd_hda_intel snd_hda_codec 
snd_pcm_oss snd_mixer_oss snd_pcm snd_timer psmouse sg snd serio_raw i2c_i801 
soundcore eth1394 sdhci mmc_core evdev i2c_core snd_page_alloc sr_mod rtc cdrom 
uhci_hcd sd_mod ohci1394 ehci_hcd b44 mii usbcore ieee1394 thermal processor 
fan
Oct  7 10:56:36 geeKISSexy kernel: CPU:    0
Oct  7 10:56:36 geeKISSexy kernel: EIP:    0060:[<f9121109>]    Tainted: P      
VLI
Oct  7 10:56:36 geeKISSexy kernel: EFLAGS: 00010202   (2.6.17-my-patches-xps1 
#2)
Oct  7 10:56:36 geeKISSexy kernel: EIP is at CniGetBindingByIndex+0xf/0x21 
[cisco_ipsec]
Oct  7 10:56:36 geeKISSexy kernel: eax: f91721f0   ebx: 80002078   ecx: 
00000003   edx: 00000001
Oct  7 10:56:36 geeKISSexy kernel: esi: f9172208   edi: c6ad3ee4   ebp: 
c6ad3d48   esp: c6ad3d2c
Oct  7 10:56:36 geeKISSexy kernel: ds: 007b   es: 007b   ss: 0068
Oct  7 10:56:36 geeKISSexy kernel: Process cvpnd (pid: 14019, 
threadinfo=c6ad2000 task=dff47550)
Oct  7 10:56:36 geeKISSexy kernel: Stack: f9124cf1 00000003 00490024 c02d3337 
c01ab25d 000000d8 00000002 c6ad3ec8
Oct  7 10:56:36 geeKISSexy kernel:        f9129250 00000000 00000003 f9172210 
f917221a 2938fea9 257003ca 00000246
Oct  7 10:56:36 geeKISSexy kernel:        00000046 dfbc1640 000000d0 00000000 
00000000 d0018810 c17ebe00 c0187ef5
Oct  7 10:56:36 geeKISSexy kernel: Call Trace:
Oct  7 10:56:36 geeKISSexy kernel:  <f9124cf1> 
ConfigurePublicInterface+0x11/0x70 [cisco_ipsec]  <c02d3337> 
_spin_unlock+0xd/0x21
Oct  7 10:56:36 geeKISSexy kernel:  <c01ab25d> find_revoke_record+0x73/0x7c  
<f9129250> CniPluginIOCTL+0x450/0x640 [cisco_ipsec]
Oct  7 10:56:36 geeKISSexy kernel:  <c0187ef5> proc_alloc_inode+0x3e/0x63  
<c02d3215> _spin_lock+0xd/0x5a
Oct  7 10:56:36 geeKISSexy kernel:  <c02d3337> _spin_unlock+0xd/0x21  
<c018aec1> proc_lookup+0xa0/0xbf
Oct  7 10:56:36 geeKISSexy kernel:  <c01683b5> do_lookup+0x4f/0x135  <c0170df6> 
dput+0x1a/0x11b
Oct  7 10:56:36 geeKISSexy kernel:  <c0146b21> 
__mod_page_state_offset+0x11/0x1f  <c014738e> 
get_page_from_freelist+0x1d1/0x35b
Oct  7 10:56:36 geeKISSexy kernel:  <c01213e7> local_bh_enable+0x68/0x7e  
<c027ea6d> neigh_lookup+0xed/0xf7
Oct  7 10:56:36 geeKISSexy kernel:  <c02b1e9b> arp_ioctl+0x56c/0x5a1  
<c0146b21> __mod_page_state_offset+0x11/0x1f
Oct  7 10:56:36 geeKISSexy kernel:  <c02d3337> _spin_unlock+0xd/0x21  
<c014e04b> __handle_mm_fault+0x6df/0x707
Oct  7 10:56:36 geeKISSexy kernel:  <f9121a4b> interceptor_ioctl+0x0/0x2bd 
[cisco_ipsec]  <f9121ab4> interceptor_ioctl+0x69/0x2bd [cisco_ipsec]
Oct  7 10:56:36 geeKISSexy kernel:  <c027afbb> dev_ifsioc+0x362/0x37c  
<c0271923> sock_ioctl+0x0/0x1c2
Oct  7 10:56:36 geeKISSexy kernel:  <c027b55b> dev_ioctl+0x3da/0x46b  
<c02d3337> _spin_unlock+0xd/0x21
Oct  7 10:56:36 geeKISSexy kernel:  <c01703e0> d_rehash+0x5c/0x69  <c02d3337> 
_spin_unlock+0xd/0x21
Oct  7 10:56:36 geeKISSexy kernel:  <c027231b> sock_attach_fd+0x6c/0xcc  
<c015b31c> fd_install+0x24/0x50
Oct  7 10:56:36 geeKISSexy kernel:  <c0271923> sock_ioctl+0x0/0x1c2  <c016c294> 
do_ioctl+0x1c/0x5d
Oct  7 10:56:36 geeKISSexy kernel:  <c016c51f> vfs_ioctl+0x24a/0x25c  
<c0273027> sys_socketcall+0x51/0x181
Oct  7 10:56:36 geeKISSexy kernel:  <c016c579> sys_ioctl+0x48/0x5f  <c0102cb3> 
sysenter_past_esp+0x54/0x75
Oct  7 10:56:36 geeKISSexy kernel: Code: 42 83 c0 2c 3d f0 21 17 f9 75 ed b8 00 
00 51 24 89 13 eb 05 b8 06 00 51 e4 5b 5e c3 8b 4c 24 04 b8 80 1e 17 f9 8b 10 
85 d2 74 05 <3b> 4a 40 74 0c 83 c0 2c 3d 1c 22 17 f9 75 eb 31 c0 c3 55 31 c9
Oct  7 10:56:36 geeKISSexy kernel: EIP: [<f9121109>] 
CniGetBindingByIndex+0xf/0x21 [cisco_ipsec] SS:ESP 0068:c6ad3d2c
Comment 1 Adrian Bunk 2006-10-09 23:55:27 UTC
Bugs with binary-only modules loaded (you are using at least two of them) are
not debuggable.

Please ask the vendors of these modules for support.
Comment 2 Stefan Richter 2006-10-09 23:57:41 UTC
If this didn't happen in 2.6.17, it would be helpful if you could check for
possible culprits among the ieee1394 driver updates from 2.6.17 to 2.6.18:
http://me.in-berlin.de/~s5r6/linux1394/merged/in_2.6.18/

I could bring this patch collection into proper order so that you can biject
them (e.g. with quilt). Should I prepare this on top of plain 2.6.17 or on any
2.6.17.x? There were also ieee1394 patches in 2.6.17.2, .8, .11. Bijecting on
top of plain 2.6.17 would check these -stable patches too.

We could also stack the reverse of the patches on top of 2.6.18, or at least
almost all of them.

There was only a single patch to eth1394 which you could test first.
Comment 3 Ritesh Raj Sarraf 2006-11-02 09:44:59 UTC
In reply to comment #1:

If a binary only module is not acceptable and you won't fix it, why don't you 
simply deny load of such modules. If you want to stop binary-only modules, 
don't have any such framework for it.

First you show the path, and then you mandate to walk your way. Why not first 
teach them to work your way and then give them access to the path.
Comment 4 Adrian Bunk 2006-11-02 11:29:55 UTC
The module support was not made for binary-only modules.

Whether binary-only modules are legal at all is a disputed question only lawyers
can decide.

But the point why bugs with binary-only modules loaded are innvalid here is that
a module can do ANYTHING, and we've already had too many seemingly unrelated
bugs that turned out to be caused by binary-only modules, and that were
undebuggable since we don't have the source for the binary-only modules.
Comment 5 Stefan Richter 2006-11-02 11:31:36 UTC
Adrian said they are _not debuggable_ rather than _not acceptable_. You can load
them, and you can try to debug them on your own or with the help of the authors
of this module.

Before you do so you could check for a potential regression of eth1394 like I
suggested. Please say so if you like to get the 2.6.17-to-2.6.18 FireWire patch
series rearranged for this purpose. But you could also use git to do so, using a
clone of Linus' tree and bijecting between the known good and bad snapshots. If
you find the point were it broke, we can try to get a clue if the issue is with
Linux or with the VPN client. But the findings could also turn out inconclusive,
which would shift the burden to Cisco.
Comment 6 Stefan Richter 2006-11-02 11:34:34 UTC
PS: I absolutely agree to keep this bug 'REJECTED INVALID', unless rrs is able
to dig out an actual kernel bug.
Comment 7 Ritesh Raj Sarraf 2006-11-04 06:51:45 UTC
Hi,

I was just looking into the installer package of the cisco vpnclient.

You mentioned in comment #1 seeing 2 binary-only modules loaded. One is 
nvidia. Can you tell the other one please.

I hope that my understanding that ipw3945 is not a binary-only module is 
correct. If yes, then the cisco_ipsec module shouldn't also qualify as a 
binary only module. The source code to build the kernel module is provided in 
the tar.gz file. It, same as ipw3945, copies its binary daemon (cvpnd), 
libraries and init scripts.

The files that build the cisco_ipsec module are provided with the package.

Would be great if you could have a look to see if it really is a candidate for 
a binary-only module.
Comment 8 Stefan Richter 2006-11-04 07:44:48 UTC
Unless module authors play tricks, 'dmesg | grep taint' should show which
modules tainted the kernel, AFAIK. BTW the term 'binary only' is a bit
misleading: nVidia's driver for example come in several parts; two of them run
in the kernel's address space: A thin open source interface layer and the actual
kernel driver which is closed source. AFAIK cisco_ipsec, i.e. the component that
is loaded into the kernel's address space, is partly closed source too. Or did
they release it as open source now?
Comment 9 Adrian Bunk 2006-11-04 15:58:05 UTC
One binary-only module (nvidia) makes it undebuggable.

At least three external modules are loaded, and even one of them alone might
will make it off-topic here.

Even further, you seem to simply ignore Stefan's request to check whether any of
the ieee1394 updates in 2.6.18 caused your breakage.

There are rules what's offtopic here and what is ontopic. It's simply required
to set limits since the (often spare) time people are spending on debugging
kernel bugs here is not unlimited.

There are people offering technical consultancy that might spend as much time as
you want to pay them for on helping you debugging your problems.
Comment 10 Ritesh Raj Sarraf 2006-11-05 10:12:28 UTC
On request, I could have reproduced the bug without the nvidia module.

Yes, I've been avoiding Stefan's request because:
a) I'm not a Kernel Developer
b) Nor a Q.A.
c) I'm an end-user using Linux on my laptop busy with my own deadlines

I filed the bug because I found it. And I keep the tendency to report such 
bugs so that people remain aware.

The argument of whether it is a valid bug or not, whether binary-only modules 
should be allowed or not, is not my domain.
Hence, closing it.

Note You need to log in before you can comment on or make changes to this bug.