Bug 5762

Summary: sigkill leaves process running with RSS=0 in 2.6.15-rc5
Product: Process Management Reporter: Alan Somers (asomers)
Component: OtherAssignee: process_other
Status: REJECTED INVALID    
Severity: normal CC: airlied, akpm, bunk, kernel
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.15-rc5 Subsystem:
Regression: --- Bisected commit-id:
Attachments: profile after X has crashed
profile after 'killall -9 X'

Description Alan Somers 2005-12-19 20:13:47 UTC
Most recent kernel where this bug did not occur: unknown

Distribution: gentoo

Hardware Environment: dual opteron 244, 1GB RAM, Tyan S2885, Radeon 7500

Software Environment: 64 bit kernel , X.org 6.8.2-r6 , unreal tournament 2005,
open source radeon driver

Problem Description: unreal tournament 2004 demo crashed, and took out X.  I
SSHed in and killed
ut2004-bin with kill -9, but X did not respond to kill, so I tried kill -9 on
X.  Instead of killing X, this put X into a curious state.  ps shows that X is
running, but top shows it to have RES = SHR = VIRT = 0
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND         
 7033 root      25   0     0    0    0 R 98.7  0.0  22:40.70 X

ps -ax shows
 7033 ?        R     22:53 [X]

The system does not respond to the keyboard, and typing "chvt 2" in an ssh
window does not work either.  chvt hangs, and strace shows that it hangs at this
line:
open("/dev/tty0", O_RDWR)               = 3
ioctl(3, KDGKBTYPE, 0x7fffff86537b)     = 0
ioctl(3, VIDIOC_G_COMP or VT_ACTIVATE, 0x2) = 0
ioctl(3, VIDIOC_S_COMP or VT_WAITACTIVE

attempting to attach gdb to X as root gives the following error:
Attaching to program: /usr/bin/X, process 6850
ptrace: Operation not permitted.

Xorg.0.log shows nothing out of the ordinary.

I have reproduced this with kernels gentoo-sources-2.6.14 and
gentoo-sources-2.6.14-r3 as well as the regular 2.6.15-rc5 .

Steps to reproduce: play ut2004 demo for 2-3 hours, and the game will crash and
X will crash with it.  SSH in and run top .  X will be consuming 99% of the cpu,
but otherwise is normal.  Do killall -9 X and run top again.  X will now be in
strange state described above.
Comment 1 Daniel Drake 2005-12-20 00:25:44 UTC
Downstream bug: http://bugs.gentoo.org/115905
Comment 2 Alan Somers 2005-12-21 07:53:58 UTC
It looks like 2.6.11-rc5-mm1 is immune .  I've crashed the game twice but
neither time has X crashed.
Comment 3 Alan Somers 2005-12-21 20:09:05 UTC
no, 2.6.11-rc5-mm1 is vulnerable.  I must have discovered another bug in the
game earlier.

More details on how to trigger the crash:  It has only happened in the
ONS-Torlan level (which I play most often anyway cause its the best), with me
and either 7 or 9 bots, and on at least 2 out of 4 occasions I was standing in
the center river channel under the communications tower.  
Comment 4 Andrew Morton 2006-01-19 03:52:40 UTC
I'd suspect that X has put the video card into a weird state and the driver has
hung up.

Can you please generate a kernel profile of the system when the X server 
is in its hung-up state?  Documentation/basic_profiling.txt will
help - it'll tell us where the kernel got stuck.

Thanks.
Comment 5 Alan Somers 2006-01-31 20:33:15 UTC
Here are two profiles.  The profile-start was generated after the game and X
crashed, but before I killed X.  After I did 'killall -9 X' and X went to RSS=0,
I cleared the profile and then generated profile-postkill .

This was done in 2.6.15-rc5 .  I was unable to produce the bug in 2.6.15 even
with several hours of gameplay.
Comment 6 Alan Somers 2006-01-31 20:34:05 UTC
Created attachment 7197 [details]
profile after X has crashed
Comment 7 Alan Somers 2006-01-31 20:34:45 UTC
Created attachment 7198 [details]
profile after 'killall -9 X'
Comment 8 Dave Airlie 2006-02-02 02:13:07 UTC
welcome to a 3D GPU crash, have a nice day :-)

The kernel should make very little difference to this crashing or not, the DRI
drivers are more of a culprit...