Bug 83601 - Race condition in HR timers that cause double insertion and hard lockup
Summary: Race condition in HR timers that cause double insertion and hard lockup
Status: RESOLVED OBSOLETE
Alias: None
Product: Timers
Classification: Unclassified
Component: Interval Timers (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: timers_realtime-clock
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-31 18:20 UTC by Itzcak Pechtalt
Modified: 2014-09-02 17:59 UTC (History)
0 users

See Also:
Kernel Version: 2.6.30 - 3.16.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg Hard lockup call trace (3.40 KB, text/plain)
2014-08-31 18:20 UTC, Itzcak Pechtalt
Details

Description Itzcak Pechtalt 2014-08-31 18:20:33 UTC
Created attachment 148921 [details]
dmesg Hard lockup call trace

While running 500Mbps load with "QDISK TBF" active (ran by tc qdisk command)we got hard lockup in the system.

See below in the end, code fix suggestion.

This bug is relevant to all elements using HR timers as well!!
Also the bug exists on all kernel versions from 2.6.30 and up to 3.16.1!

Following is the detailed root cause analysis.

From dmesg log (attached) and crash dump, we found that the problem is in hrtimer_start function which tries to re insert a timer which is ALREADY there!!
As result, the timer node points to itself and rb_insert_color function gets into unfinit loop (as "node" and "parent" and "gparent" are all the same entry).

The origin of the bug is in HR timers in file kernel/hrtimer.c:
in "switch_hrtimer_base" function desired behaviour is that  
timer may move to another CPU unless callback is now running.
This is explanation from inside the code:
* We are trying to move timer to new_base.
* However we can't change timer's base while it is running,
* so we keep it on the same CPU. 
Actually, TIMER MAY MOVE EVEN WHEN CALLBACK IS RUNNING!
It's because __remove_hrtimer function overwrites HRTIMER_STATE_CALLBACK flag with  "timer->state = newstate", __remove_hrtimer maybe called in parallel with interrupt in another CPU while the spinlock is released (in __run_hrtimer).
Finally, when a timer is moved to another CPU in the middle of __run_hrtimer, then the last line "timer->state &= ~HRTIMER_STATE_CALLBACK" may delete the HRTIMER_STATE_ENQUEUED flag which is set by another CPU and then next timer insert will insert the timer a second time (as the end of the function only locks CURRENT CPU and the timer maybe already on another CPU).     

Following is the fix patch based on kernel 3.16.1 (just simple):
diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c
--- a/kernel/hrtimer.c  2014-08-31 20:59:52.177452123 +0300
+++ b/kernel/hrtimer.c  2014-08-31 21:02:14.972166540 +0300
@@ -941,7 +941,7 @@
        if (!timerqueue_getnext(&base->active))
                base->cpu_base->active_bases &= ~(1 << base->index);
 out:
-       timer->state = newstate;
+       timer->state = (newstate | (timer_state & HRTIMER_STATE_CALLBACK));
 }

 /*

Itzcak Pechtalt
Comment 1 Itzcak Pechtalt 2014-09-02 17:59:02 UTC
Following is the response from Linus on the above:

The patch is whitespace-damaged, but with a small oneliner like this that doesn't much matter (the timer files moved to kernel/time/ during this merge window, so the patch wouldn't apply as-is anyway).

It needs a sign-off (see Documentation/SubmittingPatches), but even more importantly it needs to go to the right people for double-checking.

But the patch is more broken than whitespace and even lack of sign-off. It cannot even have compiled. I'm assuming "timer_state" was intended to be "timer->state". Also, every caller but one already has "HRTIMER_STATE_CALLBACK" set unconditionally or to the old state in "newstate", so I suspect if this patch is the real fix (which I'll leave for Thomas to comment more on), afaik the actual problem can only happen through migrate_hrtimer_list() which uconditionally sets the whole state to HRTIMER_STATE_MIGRATE.

           Linus


And Indeed,  my crash dump occurred on kernel 2.6.30 (RH 6.5 distribution) and in this version function remove_hrtimer  doesn't preserve "HRTIMER_STATE_CALLBACK" flag and triggers the hard lockup.
So this bug was already fixed in version 2.6.35 to set the flag by remove_hrtimer function.

Seems that  migrate_hrtimer_list case isn't a problem because it is called when old CPU is already dead.

SO BUG WAS ALREADY RESOLVED IN 2.6.38 version

Itzcak Pechtalt

Note You need to log in before you can comment on or make changes to this bug.