Bug 106831

Summary: Back-to-back OOM Killer Events Lockup the Kernel
Product: Process Management Reporter: Chris Carday (ccarday)
Component: SchedulerAssignee: Ingo Molnar (mingo)
Status: NEW ---    
Severity: normal CC: szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.10.53 Subsystem:
Regression: No Bisected commit-id:

Description Chris Carday 2015-10-28 23:35:56 UTC
The current kernel code:

After oom-killer kicks in, the kernel picks a task(thread) to kill & sets a bit in its task_struct saying "I am dying so bypass me if another oom event comes along". Then a kill -9 is sent to each task in that process - using a loop.  The task structs start getting cleaned up.

When/if a new oom event comes in, the kernel again looks for a task to kill.  It looks for one without that dying bit set.  It may happen to find a task in the same process as before - if not fully killed/cleaned up yet.  So it begins the same kill procedure, but the linked lists are in an intermediate state because the previous oom killer event has begun shutting them down. (hence possible infinite loop due to transient list members).

Proposed fix:

Instead of setting just the bit in the one specified task, set the bit in all tasks of the process chosen to be killed.  This prevents the process from being chosen a second time. 

--- linux-3.10.53/mm/oom_kill.c	2014-08-13 21:24:29.000000000 -0400
+++ linux-3.10.53-working/mm/oom_kill.c	2015-10-28 16:39:28.274157000 -0400
@@ -501,7 +501,15 @@
 		}
 	rcu_read_unlock();
 
-	set_tsk_thread_flag(victim, TIF_MEMDIE);
+	write_lock(&tasklist_lock);
+	t = victim;
+
+	do {
+		set_tsk_thread_flag(t, TIF_MEMDIE);
+	} while_each_thread(victim, t);
+
+	write_unlock(&tasklist_lock);
+
 	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
 	put_task_struct(victim);
 }
Comment 1 Chris Carday 2015-10-28 23:46:27 UTC
Specifically - the bug occurring here is that task_struct->thread_group is being traversed & hits a member that has "t->next = t" & the code infinitely loops there while holding the tasklist_lock.