Bug 8067

Summary: paging opps in keyring_destroy
Product: Other Reporter: Thomas Klute (klute)
Component: OtherAssignee: David Howells (dhowells)
Status: REJECTED DUPLICATE    
Severity: high CC: bunk
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.18 Subsystem:
Regression: --- Bisected commit-id:

Description Thomas Klute 2007-02-23 01:06:02 UTC
Problem Description:
BUG: unable to handle kernel paging request at virtual address 00200200
Call Trace:
[<c04bbb80>] keyring_destroy+0x28/0x65

We got 2 issues of this bug on different machines, others reports are here:

http://forums.fedoraforum.org/showthread.php?t=146489
http://permalink.gmane.org/gmane.linux.debian.devel.bugs.general/171195
http://phinloda.jugem.cc/?cid=3
http://megagate.ru/news/arc/96/22007/286630.shtm

Both machines ran stable and without problems for several weeks. One got a
kernel update to 2.6.18 the other was a new installation of FC6.
There was nothing special going on on the machines at the time when the oops
occured. The problem occured only once on both machines. 
Thus we are unable the reproduce it at the moment. It seems to be a kernel bug,
because I found the several posts above that apply to debian distribution also. 

=== Case 1 ===
Distribution: Fedora Core 6
Kernel Version: 2.6.18-1.2849.fc6
Hardware Environment: 
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 3
cpu MHz         : 801.871
cache size      : 256 KB

Software Environment: Fedora Core 6, used as a Webserver, no X
Steps to reproduce: don't know
Kernel oops trace:
Jan 25 08:50:59 vserver1 kernel: BUG: unable to handle kernel paging request at
virtual address 00200200
Jan 25 08:50:59 vserver1 kernel:  printing eip:
Jan 25 08:50:59 vserver1 kernel: c04e99d1
Jan 25 08:50:59 vserver1 kernel: *pde = 18c3e067
Jan 25 08:50:59 vserver1 kernel: Oops: 0000 [#1]
Jan 25 08:50:59 vserver1 kernel: SMP
Jan 25 08:50:59 vserver1 kernel: last sysfs file:
/devices/pci0000:00/0000:00:04.3/i2c-0/0-002d/beep_enable
Jan 25 08:50:59 vserver1 kernel: Modules linked in: ipv6 autofs4 w83781d
hwmon_vid hwmon i2c_isa eeprom i2c_matroxfb i2c_algo_bit matroxfb_base
matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll
matroxfb_misc dm_mirror dm_mod video sbs i2c_ec button battery asus_acpi ac
parport_pc lp parport 8139cp 8139too mii pcspkr serio_raw i2c_piix4 i2c_core
raid1 ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Jan 25 08:50:59 vserver1 kernel: CPU:    0
Jan 25 08:50:59 vserver1 kernel: EIP:    0060:[<c04e99d1>]    Not tainted VLI
Jan 25 08:50:59 vserver1 kernel: EFLAGS: 00010292   (2.6.18-1.2849.fc6 #1)
Jan 25 08:50:59 vserver1 kernel: EIP is at list_del+0x9/0x6c
Jan 25 08:50:59 vserver1 kernel: eax: 00200200   ebx: c9fe7e78   ecx: 00000000 
 edx: 00000000
Jan 25 08:50:59 vserver1 kernel: esi: c9fe7e28   edi: c16ec2a0   ebp: 00000282 
 esp: effdbf38
Jan 25 08:50:59 vserver1 kernel: ds: 007b   es: 007b   ss: 0068
Jan 25 08:50:59 vserver1 kernel: Process events/0 (pid: 5, ti=effdb000
task=efe20030 task.ti=effdb000)
Jan 25 08:50:59 vserver1 kernel: Stack: d74899c4 00000000 d74899a0 c9fe7e20
c04bbb80 c9fe7e20 c9fe7e28 c16ec2a0
Jan 25 08:50:59 vserver1 kernel:        c04bb7c7 c068b260 c068b264 c0433c38
00000246 c16ec2a0 c16ec2c0 c04bb710
Jan 25 08:50:59 vserver1 kernel:        00000000 c16ec2c0 c16ec2a0 c16ec2b8
00000000 c0434528 00000001 00000000
Jan 25 08:50:59 vserver1 kernel: Call Trace:
Jan 25 08:50:59 vserver1 kernel:  [<c04bbb80>] keyring_destroy+0x28/0x65
Jan 25 08:50:59 vserver1 kernel:  [<c04bb7c7>] key_cleanup+0xb7/0xd0
Jan 25 08:50:59 vserver1 kernel:  [<c0433c38>] run_workqueue+0x83/0xc5
Jan 25 08:50:59 vserver1 kernel:  [<c0434528>] worker_thread+0xd9/0x10d
Jan 25 08:50:59 vserver1 kernel:  [<c04369fb>] kthread+0xc0/0xed
Jan 25 08:50:59 vserver1 kernel:  [<c0404dab>] kernel_thread_helper+0x7/0x10


=== Case 2 ===
Distribution: Fedora Core 6
Kernel Version: 2.6.18-1.2868.fc6
Hardware Environment: Dell PowerEdge 2850
Software Environment: Fedora Core 6, Webserver, Mailserver, Samba-Server
2 Xeon Processors, proc/cpuinfo (only for the first of the 4 cpus):
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 2.80GHz
stepping        : 3
cpu MHz         : 2793.500
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni
monitor ds_cpl cid cx16 xtpr
bogomips        : 5589.62

Steps to reproduce: don't know
Kernel oops trace:
Feb 22 13:20:44 mailserver kernel: BUG: unable to handle kernel paging request
at virtual address 00200200
Feb 22 13:20:44 mailserver kernel:  printing eip:
Feb 22 13:20:44 mailserver kernel: c04e9991
Feb 22 13:20:44 mailserver kernel: *pde = 19e45067
Feb 22 13:20:44 mailserver kernel: Oops: 0000 [#1]
Feb 22 13:20:44 mailserver kernel: SMP
Feb 22 13:20:44 mailserver kernel: last sysfs file: /class/net/eth0/address
Feb 22 13:20:44 mailserver kernel: Modules linked in: nfs lockd fscache nfs_acl
drbd(U) autofs4 sunrpc ipv6 dm_mirror dm_mod video sbs i2c_ec i2c_core button
battery asus_acpi ac parport_pc lp parport sg e752x_edac ide_cd floppy e1000
edac_mc cdrom pcspkr serio_raw megaraid_mbox sd_mod scsi_mod megaraid_mm ext3
jbd ehci_hcd ohci_hcd uhci_hcd
Feb 22 13:20:44 mailserver kernel: CPU:    3
Feb 22 13:20:44 mailserver kernel: EIP:    0060:[<c04e9991>]    Not tainted VLI
Feb 22 13:20:44 mailserver kernel: EFLAGS: 00010292   (2.6.18-1.2868.fc6 #1)
Feb 22 13:20:44 mailserver kernel: EIP is at list_del+0x9/0x6c
Feb 22 13:20:44 mailserver kernel: eax: 00200200   ebx: ecc57e98   ecx: f7fff080
  edx: 00000003
Feb 22 13:20:44 mailserver kernel: esi: ecc57e48   edi: f7e2a540   ebp: 00000282
  esp: c2146f38
Feb 22 13:20:44 mailserver kernel: ds: 007b   es: 007b   ss: 0068
Feb 22 13:20:44 mailserver kernel: Process events/3 (pid: 17, ti=c2146000
task=f7e5f180 task.ti=c2146000)
Feb 22 13:20:44 mailserver kernel: Stack: df4f8764 00000000 df4f8740 ecc57e40
c04bbb5c ecc57e40 ecc57e48 f7e2a540
Feb 22 13:20:44 mailserver kernel:        c04bb7a3 c068b260 c068b264 c0433c18
00000246 f7e2a540 f7e2a560 c04bb6ec
Feb 22 13:20:44 mailserver kernel:        00000000 f7e2a560 f7e2a540 f7e2a558
00000000 c0434508 00000001 00000000
Feb 22 13:20:44 mailserver kernel: Call Trace:
Feb 22 13:20:44 mailserver kernel:  [<c04bbb5c>] keyring_destroy+0x28/0x65
Feb 22 13:20:44 mailserver kernel:  [<c04bb7a3>] key_cleanup+0xb7/0xd0
Feb 22 13:20:44 mailserver kernel:  [<c0433c18>] run_workqueue+0x83/0xc5
Feb 22 13:20:44 mailserver kernel:  [<c0434508>] worker_thread+0xd9/0x10d
Feb 22 13:20:44 mailserver kernel:  [<c04369db>] kthread+0xc0/0xed
Feb 22 13:20:44 mailserver kernel:  [<c0404dab>] kernel_thread_helper+0x7/0x10
Feb 22 13:20:44 mailserver kernel: DWARF2 unwinder stuck at
kernel_thread_helper+0x7/0x10
Feb 22 13:20:44 mailserver kernel: Leftover inexact backtrace:
Feb 22 13:20:44 mailserver kernel:  =======================
Feb 22 13:20:44 mailserver kernel: Code: 8d 46 04 e8 86 00 00 00 8d 4b 0c 8b 51
04 8d 46 0c 83 c4 14 5b 5e 5f e9 72 00 00 00 89 c3 eb e8 90 90 53 89 c3 83 ec 0c
8b 40 04 <8b> 00 39 d8 74 1c 89 5c 24 04 89 44 24 08 c7 04 24 4f 1c 64 c0
Feb 22 13:20:44 mailserver kernel: EIP: [<c04e9991>] list_del+0x9/0x6c SS:ESP
0068:c2146f38
Feb 22 13:20:54 mailserver kernel:  <3>BUG: soft lockup detected on CPU#3!
Feb 22 13:20:54 mailserver kernel:  [<c04051db>] dump_trace+0x69/0x1af
Feb 22 13:20:54 mailserver kernel:  [<c0405339>] show_trace_log_lvl+0x18/0x2c
Feb 22 13:20:54 mailserver kernel:  [<c04058ed>] show_trace+0xf/0x11
Feb 22 13:20:54 mailserver kernel:  [<c04059ea>] dump_stack+0x15/0x17
Feb 22 13:20:54 mailserver kernel:  [<c044da6d>] softlockup_tick+0xad/0xc4
Feb 22 13:20:54 mailserver kernel:  [<c042e57a>] update_process_times+0x39/0x5c
Feb 22 13:20:54 mailserver kernel:  [<c0418914>] smp_apic_timer_interrupt+0x5c/0x64
Feb 22 13:20:54 mailserver kernel:  [<c0404ad3>] apic_timer_interrupt+0x1f/0x24
Feb 22 13:20:54 mailserver kernel: DWARF2 unwinder stuck at
apic_timer_interrupt+0x1f/0x24
Feb 22 13:20:54 mailserver kernel: Leftover inexact backtrace:
Feb 22 13:20:54 mailserver kernel:  [<c06123ff>] __write_lock_failed+0xf/0x20
Feb 22 13:20:54 mailserver kernel:  [<c04e9891>] _raw_write_lock+0x5d/0x74
Feb 22 13:20:54 mailserver kernel:  [<c04bc073>] keyring_publish_name+0x2c/0x6d
Feb 22 13:20:54 mailserver kernel:  [<c04bc0c2>] keyring_instantiate+0xe/0x13
Feb 22 13:20:54 mailserver kernel:  [<c04bb010>]
__key_instantiate_and_link+0x2f/0xa8
Feb 22 13:20:54 mailserver kernel:  [<c04bc283>] keyring_alloc+0x53/0x6a
Feb 22 13:20:54 mailserver kernel:  [<c04bd900>] alloc_uid_keyring+0x4c/0xb2
Feb 22 13:20:54 mailserver kernel:  [<c042e9f1>] alloc_uid+0x95/0x13c
Feb 22 13:20:54 mailserver kernel:  [<c0431850>] set_user+0xb/0x8e
Feb 22 13:20:54 mailserver kernel:  [<c043311b>] sys_setresuid+0x111/0x1dd
Feb 22 13:20:54 mailserver kernel:  [<c0404013>] syscall_call+0x7/0xb
Feb 22 13:20:54 mailserver kernel:  =======================
Comment 1 David Howells 2007-02-23 06:46:34 UTC
I suspect this is a duplicate of bug 7727.  Can you try the patch attached to 
that?
Comment 2 David Howells 2007-02-23 07:05:05 UTC
As you're using an FC6 kernel, you could grab the .2911 kernel from:

http://download.fedora.redhat.com/pub/fedora/linux/core/updates/testing/6/

and try that.  That has the patch included.
Comment 3 Thomas Klute 2007-02-23 07:10:50 UTC
yes, maybe it's a duplicate.
I could try to apply the patch, but both machines are production systems.
We can't take them down for testing.
I will update both machines to kernel-2.6.19-1.2911.fc6

We could setup a test machine with 2.6.18 and do some testing.
But I need advice how to reproduce the crash in keyring_destroy.
Running your script would make the kernel crash in key_alloc?

Comment 4 David Howells 2007-02-23 07:26:55 UTC
> We could setup a test machine with 2.6.18 and do some testing.
> But I need advice how to reproduce the crash in keyring_destroy.
> Running your script would make the kernel crash in key_alloc?

Unless your bug happens to be the one I already know about and have fixed, I 
can't tell you how to reproduce the problem.  I do know, however, that the bug 
I've fixed can happen in either place, it's just that it's relatively 
difficult to force it to happen in keyring_destroy() as that's run lazily.
Comment 5 Thomas Klute 2007-02-23 07:34:01 UTC
Ok, then please reject my report as duplicate and I will reopen it, if this
still happens with the new kernel version.

Thanks for help and regards!
Thomas
Comment 6 Adrian Bunk 2007-02-24 09:53:53 UTC

*** This bug has been marked as a duplicate of 7727 ***