Most recent kernel where this bug did not occur: -- Distribution: Fedora-core 6 Hardware Environment: Intel Xeon Dual Core (Pentium D 925 3000MHz), Dell PowerEdge SC440 Software Environment: samba-3.0.23, apache-2, openssh... Problem Description: system gets suddendly blocked. Answers to pings but nothing else (samba, ssh and apache stop answering), rebooting needed. Reading dmesg gives an oops (attahed at bottom). Steps to reproduce: Unable to say, system has worked OK for 3+months and this has happened just once, yesterday. Cannot test because it is in deep production. relevant dmesg output: Sep 27 10:45:00 svrdlv kernel: Unable to handle kernel NULL pointer dereference at 0000000000000010 RIP: Sep 27 10:45:00 svrdlv kernel: [<ffffffff80222a56>] rb_erase+0x14c/0x2aa Sep 27 10:45:00 svrdlv kernel: PGD 0 Sep 27 10:45:00 svrdlv kernel: Oops: 0000 [1] SMP Sep 27 10:45:00 svrdlv kernel: last sysfs file: /class/net/eth0/address Sep 27 10:45:00 svrdlv kernel: CPU 0 Sep 27 10:45:00 svrdlv kernel: Modules linked in: nls_utf8 cifs autofs4 ipv6 dm_mirror dm_multipath dm_mod video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport intel_rng sg ide_cd serio_raw cdrom i2c_i801 i2c_core shpchp tg3 pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Sep 27 10:45:00 svrdlv kernel: Pid: 8, comm: events/0 Tainted: G M 2.6.18-1.2798.fc6 #1 Sep 27 10:45:00 svrdlv kernel: RIP: 0010:[<ffffffff80222a56>] [<ffffffff80222a56>] rb_erase+0x14c/0x2aa Sep 27 10:45:00 svrdlv kernel: RSP: 0018:ffff810037f49de8 EFLAGS: 00010282 Sep 27 10:45:00 svrdlv kernel: RAX: 0000000000000000 RBX: ffff810035215a48 RCX: ffff8100384f15c8 Sep 27 10:45:00 svrdlv kernel: RDX: 0000000000000000 RSI: ffffffff806e45e0 RDI: 0000000000000000 Sep 27 10:45:00 svrdlv kernel: RBP: ffffffff806e45e0 R08: ffff810035215448 R09: ffff81003fe2aa80 Sep 27 10:45:00 svrdlv kernel: R10: 0000000000000000 R11: 00007fff9d365bb0 R12: ffff81003fe2aa80 Sep 27 10:45:00 svrdlv kernel: R13: 0000000000000282 R14: 0000000000000000 R15: ffffffff803125ff Sep 27 10:45:00 svrdlv kernel: FS: 0000000000000000(0000) GS:ffffffff80609000(0000) knlGS:0000000000000000 Sep 27 10:45:00 svrdlv kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Sep 27 10:45:00 svrdlv kernel: CR2: 0000000000000010 CR3: 0000000007083000 CR4: 00000000000006e0 Sep 27 10:45:00 svrdlv kernel: Process events/0 (pid: 8, threadinfo ffff810037f48000, task ffff810037fef7d0) Sep 27 10:45:00 svrdlv kernel: Stack: ffff8100384f1980 ffff8100384f1988 ffffffff80312659 0000000000000282 Sep 27 10:45:00 svrdlv kernel: ffffffff8056c280 ffffffff8056c288 ffffffff8024bf5f ffffffff8024884a Sep 27 10:45:00 svrdlv kernel: ffff81003fe2aa80 ffffffff8024884a ffff81003fe15c80 ffff81003fe15cf0 Sep 27 10:45:00 svrdlv kernel: Call Trace: Sep 27 10:45:00 svrdlv kernel: [<ffffffff80312659>] key_cleanup+0x5a/0xfb Sep 27 10:45:00 svrdlv kernel: [<ffffffff8024bf5f>] run_workqueue+0x9a/0xed Sep 27 10:45:00 svrdlv kernel: [<ffffffff8024893a>] worker_thread+0xf0/0x122 Sep 27 10:45:00 svrdlv kernel: [<ffffffff80232843>] kthread+0xf6/0x12a Sep 27 10:45:00 svrdlv kernel: [<ffffffff8025cea5>] child_rip+0xa/0x11 Sep 27 10:45:00 svrdlv kernel: DWARF2 unwinder stuck at child_rip+0xa/0x11 Sep 27 10:45:00 svrdlv kernel: Leftover inexact backtrace: Sep 27 10:45:00 svrdlv kernel: [<ffffffff8023274d>] kthread+0x0/0x12a Sep 27 10:45:00 svrdlv kernel: [<ffffffff8025ce9b>] child_rip+0x0/0x11 Sep 27 10:45:00 svrdlv kernel: Sep 27 10:45:00 svrdlv kernel: Sep 27 10:45:00 svrdlv kernel: Code: 48 8b 4f 10 48 85 c9 74 07 48 8b 01 a8 01 74 17 48 8b 47 08 Sep 27 10:45:00 svrdlv kernel: RIP [<ffffffff80222a56>] rb_erase+0x14c/0x2aa Sep 27 10:45:00 svrdlv kernel: RSP <ffff810037f49de8> Sep 27 10:45:00 svrdlv kernel: CR2: 0000000000000010
Tainted: G M The 'm' is indication that your CPU reported a machine check exception. Nearly always the sign of hardware failure. Could be bad ram, insufficient power, poor cooling etc. etc. You're also running an *ancient* kernel with known security vulnerabilities, and many, many bugs fixed since then. If this oops did have a non-hardware-related cause, it may even have been fixed at some point in the year since that kernel was released. (2.6.20 fixed a security hole in the key handling code that would cause corrupted lists that may have manifested like this).
(In reply to comment #1) > Tainted: G M > > The 'm' is indication that your CPU reported a machine check exception. > Nearly always the sign of hardware failure. > Could be bad ram, insufficient power, poor cooling etc. etc. > > You're also running an *ancient* kernel with known security vulnerabilities, > and many, many bugs fixed since then. If this oops did have a > non-hardware-related cause, it may even have been fixed at some point in the > year since that kernel was released. (2.6.20 fixed a security hole in the > key > handling code that would cause corrupted lists that may have manifested like > this). > Thanks, sorry then for the mess. Pedro.