Bug 5762 - sigkill leaves process running with RSS=0 in 2.6.15-rc5
Summary: sigkill leaves process running with RSS=0 in 2.6.15-rc5
Alias: None
Product: Process Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: process_other
Depends on:
Reported: 2005-12-19 20:13 UTC by Alan Somers
Modified: 2006-11-09 04:53 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.15-rc5
Tree: Mainline
Regression: ---

profile after X has crashed (15.49 KB, text/plain)
2006-01-31 20:34 UTC, Alan Somers
profile after 'killall -9 X' (359 bytes, text/plain)
2006-01-31 20:34 UTC, Alan Somers

Description Alan Somers 2005-12-19 20:13:47 UTC
Most recent kernel where this bug did not occur: unknown

Distribution: gentoo

Hardware Environment: dual opteron 244, 1GB RAM, Tyan S2885, Radeon 7500

Software Environment: 64 bit kernel , X.org 6.8.2-r6 , unreal tournament 2005,
open source radeon driver

Problem Description: unreal tournament 2004 demo crashed, and took out X.  I
SSHed in and killed
ut2004-bin with kill -9, but X did not respond to kill, so I tried kill -9 on
X.  Instead of killing X, this put X into a curious state.  ps shows that X is
running, but top shows it to have RES = SHR = VIRT = 0
 7033 root      25   0     0    0    0 R 98.7  0.0  22:40.70 X

ps -ax shows
 7033 ?        R     22:53 [X]

The system does not respond to the keyboard, and typing "chvt 2" in an ssh
window does not work either.  chvt hangs, and strace shows that it hangs at this
open("/dev/tty0", O_RDWR)               = 3
ioctl(3, KDGKBTYPE, 0x7fffff86537b)     = 0
ioctl(3, VIDIOC_G_COMP or VT_ACTIVATE, 0x2) = 0

attempting to attach gdb to X as root gives the following error:
Attaching to program: /usr/bin/X, process 6850
ptrace: Operation not permitted.

Xorg.0.log shows nothing out of the ordinary.

I have reproduced this with kernels gentoo-sources-2.6.14 and
gentoo-sources-2.6.14-r3 as well as the regular 2.6.15-rc5 .

Steps to reproduce: play ut2004 demo for 2-3 hours, and the game will crash and
X will crash with it.  SSH in and run top .  X will be consuming 99% of the cpu,
but otherwise is normal.  Do killall -9 X and run top again.  X will now be in
strange state described above.
Comment 1 Daniel Drake 2005-12-20 00:25:44 UTC
Downstream bug: http://bugs.gentoo.org/115905
Comment 2 Alan Somers 2005-12-21 07:53:58 UTC
It looks like 2.6.11-rc5-mm1 is immune .  I've crashed the game twice but
neither time has X crashed.
Comment 3 Alan Somers 2005-12-21 20:09:05 UTC
no, 2.6.11-rc5-mm1 is vulnerable.  I must have discovered another bug in the
game earlier.

More details on how to trigger the crash:  It has only happened in the
ONS-Torlan level (which I play most often anyway cause its the best), with me
and either 7 or 9 bots, and on at least 2 out of 4 occasions I was standing in
the center river channel under the communications tower.  
Comment 4 Andrew Morton 2006-01-19 03:52:40 UTC
I'd suspect that X has put the video card into a weird state and the driver has
hung up.

Can you please generate a kernel profile of the system when the X server 
is in its hung-up state?  Documentation/basic_profiling.txt will
help - it'll tell us where the kernel got stuck.

Comment 5 Alan Somers 2006-01-31 20:33:15 UTC
Here are two profiles.  The profile-start was generated after the game and X
crashed, but before I killed X.  After I did 'killall -9 X' and X went to RSS=0,
I cleared the profile and then generated profile-postkill .

This was done in 2.6.15-rc5 .  I was unable to produce the bug in 2.6.15 even
with several hours of gameplay.
Comment 6 Alan Somers 2006-01-31 20:34:05 UTC
Created attachment 7197 [details]
profile after X has crashed
Comment 7 Alan Somers 2006-01-31 20:34:45 UTC
Created attachment 7198 [details]
profile after 'killall -9 X'
Comment 8 Dave Airlie 2006-02-02 02:13:07 UTC
welcome to a 3D GPU crash, have a nice day :-)

The kernel should make very little difference to this crashing or not, the DRI
drivers are more of a culprit...

Note You need to log in before you can comment on or make changes to this bug.