37322 – high load, unresponsive system spinning in rcu_kthread

Bug 37322 - high load, unresponsive system spinning in rcu_kthread

Summary: high load, unresponsive system spinning in rcu_kthread

Status:	RESOLVED OBSOLETE

Alias:	None

Product:	Process Management
Classification:	Unclassified
Component:	Scheduler (show other bugs)
Hardware:	All Linux

Importance:	P1 normal
Assignee:	Ingo Molnar

URL:
Keywords:

Depends on:
Blocks:	32012
	Show dependency tree

Reported:	2011-06-12 14:53 UTC by Ferenc Wágner
Modified:	2013-12-23 13:47 UTC (History)
CC List:	5 users (show)

See Also:
Kernel Version:	3.0.0-rc6
Subsystem:
Regression:	Yes
Bisected commit-id:

Attachments
somewhat pared down syslog from bootup to shutdown (80.35 KB, text/plain) 2011-06-12 14:53 UTC, Ferenc Wágner	Details
config of the affected kernel (2.6.38.3 with an identical config never produced this issue) (58.04 KB, text/plain) 2011-06-12 14:55 UTC, Ferenc Wágner	Details
Diagnostic patch for TINY_RCU's presumed wakeup storms. (939 bytes, patch) 2011-06-13 23:18 UTC, Paul E. McKenney	Details \| Diff
screenshot of 'top' and syslog shortly before the system recovered (34 bytes, text/plain) 2011-06-17 14:36 UTC, Ferenc Wágner	Details
RCU trace output around system breakdown (gzip compressed) (55.58 KB, application/octet-stream) 2011-07-21 17:17 UTC, Ferenc Wágner	Details
Add an attachment (proposed patch, testcase, etc.)

Description Ferenc Wágner 2011-06-12 14:53:34 UTC

Created attachment 61672 [details]
somewhat pared down syslog from bootup to shutdown

After upgrading to 2.6.39.1 (from 2.6.38.3) on my laptop and using it for a couple of hours, the system suddenly became very unresponsive, while I was chatting in Pidgin. The mouse cursor hardly moved, but it was possible to interact with the system, only very-very slowly. Starting top on a text console took several minutes, and then showed a load of 18, with rcu_kthread spinning the CPU most of the time; occasionally other kernel threads (kworker) or applications (pidgin) showed up for a while, with similarly high CPU percentages for a short time. Sysrq worked, you can find the list of blocked tasks in the attached syslog. Shutting down the system took almost an hour, but otherwise showed no errors.
A quick web search turned up http://thread.gmane.org/gmane.linux.kernel/1150098/focus=1150186, which looks very similar: mine is also an uniprocessor non-preempt kernel (config attached).

Comment 1 Ferenc Wágner 2011-06-12 14:55:01 UTC

Created attachment 61682 [details]
config of the affected kernel (2.6.38.3 with an identical config never produced this issue)

Comment 2 Paul E. McKenney 2011-06-13 23:18:37 UTC

Created attachment 61882 [details]
Diagnostic patch for TINY_RCU's presumed wakeup storms.

Hmmm...  Not much change in TINY_RCU between 2.6.38 and 2.6.39.

Nevertheless, if there is a problem, the most likely cause is some sort of wakeup storm.  Could you please try out this diagnostic patch?  It should do a once-per-10-seconds printk() to check up on the wakeup-storm assumption.

Comment 3 Ferenc Wágner 2011-06-14 08:30:17 UTC

Thanks for looking into this! I backported your patch in attachment 61882 [details] to 2.6.39.1 like:

diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 0c343b9..ebfa4bf 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -213,11 +213,20 @@ static int rcu_kthread(void *arg)
  */
 static void invoke_rcu_kthread(void)
 {
+       static unsigned long oldjiffies;
+       static unsigned long invokecount;
        unsigned long flags;
 
        local_irq_save(flags);
        have_rcu_kthread_work = 1;
        wake_up(&rcu_kthread_wq);
+       invokecount++;
+       if (UINT_CMP_GE(jiffies, oldjiffies + 10 * HZ)) {
+               printk(KERN_ALERT "invoke_rcu_kthread(): %lu invocations/s",
+                      invokecount / 10);
+               oldjiffies = jiffies;
+               invokecount = 0;
+       }
        local_irq_restore(flags);
 }

Hope it's acceptable. Booting with it shows the following:

wferi@szonett:~$ dmesg|fgrep rcu
[  310.523674] invoke_rcu_kthread(): 959 invocations/s
[  321.496407] invoke_rcu_kthread(): 7 invocations/s
[  333.496689] invoke_rcu_kthread(): 1 invocations/s
[  343.508377] invoke_rcu_kthread(): 1 invocations/s
[  353.912314] invoke_rcu_kthread(): 2 invocations/s
[  363.943577] invoke_rcu_kthread(): 2 invocations/s
[  373.944206] invoke_rcu_kthread(): 1 invocations/s
[  383.944860] invoke_rcu_kthread(): 2 invocations/s
[  393.945506] invoke_rcu_kthread(): 1 invocations/s
[  403.946143] invoke_rcu_kthread(): 1 invocations/s
[  413.946796] invoke_rcu_kthread(): 1 invocations/s
[  423.947459] invoke_rcu_kthread(): 2 invocations/s
[  433.948121] invoke_rcu_kthread(): 1 invocations/s
[  443.948690] invoke_rcu_kthread(): 14 invocations/s
[  453.949253] invoke_rcu_kthread(): 36 invocations/s
[  463.949801] invoke_rcu_kthread(): 23 invocations/s
[  473.950289] invoke_rcu_kthread(): 63 invocations/s
[  483.950788] invoke_rcu_kthread(): 49 invocations/s
[  493.951303] invoke_rcu_kthread(): 23 invocations/s
[  503.952004] invoke_rcu_kthread(): 12 invocations/s
[  513.952603] invoke_rcu_kthread(): 10 invocations/s
[  523.953099] invoke_rcu_kthread(): 20 invocations/s
[  533.953716] invoke_rcu_kthread(): 17 invocations/s
[  543.954314] invoke_rcu_kthread(): 21 invocations/s
[  553.954789] invoke_rcu_kthread(): 15 invocations/s
[  563.955380] invoke_rcu_kthread(): 3 invocations/s
[  573.955967] invoke_rcu_kthread(): 10 invocations/s
[  583.955423] invoke_rcu_kthread(): 23 invocations/s
[  593.956965] invoke_rcu_kthread(): 10 invocations/s
[  603.957667] invoke_rcu_kthread(): 2 invocations/s
[  613.958251] invoke_rcu_kthread(): 2 invocations/s
[  623.958932] invoke_rcu_kthread(): 2 invocations/s
[  633.959554] invoke_rcu_kthread(): 2 invocations/s
[  643.960229] invoke_rcu_kthread(): 1 invocations/s
[  653.960806] invoke_rcu_kthread(): 2 invocations/s
[  663.961514] invoke_rcu_kthread(): 2 invocations/s
[  673.962195] invoke_rcu_kthread(): 1 invocations/s
[  683.962732] invoke_rcu_kthread(): 26 invocations/s
[  693.963290] invoke_rcu_kthread(): 9 invocations/s
[  703.963999] invoke_rcu_kthread(): 2 invocations/s
[  713.964699] invoke_rcu_kthread(): 1 invocations/s
[  723.965283] invoke_rcu_kthread(): 1 invocations/s
[  733.965950] invoke_rcu_kthread(): 1 invocations/s

I wonder why it took 5 minutes for the first message to appear...
Anyway, the problem isn't manifest yet and I haven't got a reliable way to reproduce it. But hopefully it will reproduce itself quickly.

Comment 4 Ferenc Wágner 2011-06-17 14:26:17 UTC

Okay, I hit the issue again. No idea what provoked it, I was just chatting away in Pidgin for quite some time already (but the system previously survived much longer Pidgin sessions after the last reboot). Interestingly, the system also recovered this time after 20 minutes of struggling, shortly after I managed to quit Pidgin... This didn't happen when I first encountered the issue. Anyway, under normal system operation I mostly get 2 or 3 invoke_rcu_kthread() invocations per second. While the system was unusable due to rcu_kthread spinning the CPU, I got only 0/1 invocations per second. This doesn't look like a wakeup storm for me (or I made an error adapting the diagnostic patch to 2.6.39.1).

Comment 5 Ferenc Wágner 2011-06-17 14:36:12 UTC

Created attachment 62562 [details]
screenshot of 'top' and syslog shortly before the system recovered

Comment 6 Ferenc Wágner 2011-07-01 08:28:56 UTC

I went to 3.0.0-rc5 and compiled in the RCU tourture test alongside with your original diagnostic patch. After an "echo 1 >rcutorture_runnable" I started to get around 200 invoke_rcu_kthread() invocations per second instead of the usual 2-3 and a load slightly above 3. However, after "echo 0 >rcutorture_runnable", no torture summary appeared in dmesg, contrary to the torture.txt documentation:

$ dmesg|grep rcu
[    0.217627] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
[  311.588621] invoke_rcu_kthread(): 1256 invocations/s
[  321.838029] invoke_rcu_kthread(): 1 invocations/s
[  331.928567] invoke_rcu_kthread(): 1 invocations/s
[  342.989542] invoke_rcu_kthread(): 64 invocations/s
[  352.990217] invoke_rcu_kthread(): 2 invocations/s
[  362.990812] invoke_rcu_kthread(): 1 invocations/s
[  372.991529] invoke_rcu_kthread(): 1 invocations/s
[...]
[ 1953.094323] invoke_rcu_kthread(): 2 invocations/s
[ 1963.095013] invoke_rcu_kthread(): 2 invocations/s
[ 1973.095697] invoke_rcu_kthread(): 1 invocations/s
[ 1983.096405] invoke_rcu_kthread(): 2 invocations/s
[ 1993.097116] invoke_rcu_kthread(): 2 invocations/s
[ 2003.097696] invoke_rcu_kthread(): 3 invocations/s
[ 2013.098269] invoke_rcu_kthread(): 2 invocations/s
[ 2023.098926] invoke_rcu_kthread(): 10 invocations/s
[ 2033.099521] invoke_rcu_kthread(): 28 invocations/s
[ 2043.100221] invoke_rcu_kthread(): 4 invocations/s
[ 2053.100014] invoke_rcu_kthread(): 86 invocations/s
[ 2063.100141] invoke_rcu_kthread(): 212 invocations/s
[ 2073.100028] invoke_rcu_kthread(): 207 invocations/s
[ 2083.100018] invoke_rcu_kthread(): 213 invocations/s
[ 2093.100029] invoke_rcu_kthread(): 205 invocations/s
[ 2103.115541] invoke_rcu_kthread(): 217 invocations/s
[...]
[ 4104.208014] invoke_rcu_kthread(): 197 invocations/s
[ 4114.256282] invoke_rcu_kthread(): 206 invocations/s
[ 4124.301588] invoke_rcu_kthread(): 208 invocations/s
[ 4134.315642] invoke_rcu_kthread(): 211 invocations/s
[ 4144.317513] invoke_rcu_kthread(): 88 invocations/s
[ 4154.318120] invoke_rcu_kthread(): 2 invocations/s
[ 4164.318804] invoke_rcu_kthread(): 3 invocations/s
[ 4174.319487] invoke_rcu_kthread(): 3 invocations/s
[ 4184.320177] invoke_rcu_kthread(): 2 invocations/s
[ 4194.320857] invoke_rcu_kthread(): 6 invocations/s
[ 4204.321531] invoke_rcu_kthread(): 21 invocations/s
[ 4214.322115] invoke_rcu_kthread(): 6 invocations/s

I also checked the full dmesg without grepping for anything.
On the other hand, I haven't experienced this issue under 3.0.0-rc5 yet, but it never was reproducible anyway.

Comment 7 Ferenc Wágner 2011-07-04 15:25:30 UTC

3.0.0-rc5 definitely exhibits the original issue. I got a hunch that it may be wireless related: the load seems to rise after the access point I'm associated to weakens, and the system recovers seconds after my radio finally disassociates from the unreachable access point. Unfortunately, this may take half an hour or so. I'm using ipw2200. But this is just a feeling, I've got no hard data and can't easily test this theory either. Is there a way to log what the rcu_kthread is actually doing? Or in whose name?

Comment 8 Paul E. McKenney 2011-07-07 15:57:46 UTC

Commit 6db4df143319 from linux-rcu should reduce the overhead.  In addition, there are trace-events in git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/next that show how many callbacks are invoked at a time.  This would give greater accuracy than the patch I sent you and that you backported (which does look good, by the way).

The 1256 callbacks/second is high, but should not result in problems.

Comment 9 Ferenc Wágner 2011-07-11 11:29:23 UTC

(In reply to comment #8)
> Commit 6db4df143319 from linux-rcu should reduce the overhead.
Sorry, I'm afraid I can't find that commit (maybe because I've got no idea what linux-rcu is.)

> In addition, there are trace-events in
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
> rcu/next that show how many callbacks are invoked at a time.  This would give
> greater accuracy than the patch I sent you and that you backported (which
> does
> look good, by the way).

I've never used tracers yet, but I'll give them a try when I get around.

> The 1256 callbacks/second is high, but should not result in problems.

No, that isn't the problem indeed. The first readout is 1256, which happens after 311 seconds of uptime, so it's actually a cumulated value. I'd be interested why it takes more than 5 minutes to get here, though.

As I described in comment #4, when rcu_kthread is spinning the CPU, your debug patch reports only 0/1 invocations per second, ie. LESS than under normal operation (that is, 2/3 -- not much of a difference, but clearly visible in the logs).

Comment 10 Paul E. McKenney 2011-07-11 17:17:40 UTC

Ah, sorry!  -rcu is a git tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git

Please let me know if you need me to post a patch, but probably best if you can pull it from the git tree.

Comment 11 Ferenc Wágner 2011-07-12 17:09:47 UTC

Then I guessed right, but that tree doesn't seem to contain the commit 6db4df143319 you referenced, didn't you mean 53ef564d? Anyway, I pulled it and will give trace-events a shot.

Comment 12 Ferenc Wágner 2011-07-21 17:17:34 UTC

Created attachment 66312 [details]
RCU trace output around system breakdown (gzip compressed)

I directed trace_pipe into a file and enabled the RCU tracers with the following command:

szonett:/sys/kernel/debug/tracing/events/rcu# for i in rcu_*/enable; do echo 1 >$i; done

Eventually, I managed to reproduce the issue by clicking Disconnect in nm-applet (Network Manager got stuck on an unreachable access point). Part of syslog (starting before the disconnect, to sample normal behaviour):

Jul 18 00:14:36 szonett kernel: [14412.145524] rcu_process_callbacks(): 23 invocations in 10 seconds
Jul 18 00:14:46 szonett kernel: [14422.146135] rcu_process_callbacks(): 20 invocations in 10 seconds
Jul 18 00:14:56 szonett kernel: [14432.146833] rcu_process_callbacks(): 31 invocations in 10 seconds
Jul 18 00:15:06 szonett kernel: [14442.147536] rcu_process_callbacks(): 26 invocations in 10 seconds
Jul 18 00:15:06 szonett NetworkManager[1158]: <info> (wlan): device state change: 8 -> 3 (reason 39)
Jul 18 00:15:06 szonett NetworkManager[1158]: <info> (wlan): deactivating device (reason: 39).
Jul 18 00:15:06 szonett NetworkManager[1158]: <info> (wlan): canceled DHCP transaction, DHCP client pid 5787
Jul 18 00:15:06 szonett dnsmasq[1149]: reading /etc/resolv.conf
Jul 18 00:15:14 szonett NetworkManager[1158]: <error> [1310940914.813483] [nm-system.c:1229] check_one_route(): (wlan): error -34 returned from rtnl_route_del(): Netlink Error (errno = Numerical result out of range)
Jul 18 00:15:14 szonett dnsmasq[1149]: using nameserver 213.46.246.54#53
Jul 18 00:15:14 szonett dnsmasq[1149]: using nameserver 213.46.246.53#53
Jul 18 00:15:14 szonett dnsmasq[1149]: no servers found in /etc/resolv.conf, will retry
Jul 18 00:15:19 szonett kernel: [14452.148249] rcu_process_callbacks(): 40 invocations in 10 seconds
Jul 18 00:15:33 szonett kernel: [14463.874158] rcu_process_callbacks(): 44 invocations in 11 seconds
Jul 18 00:16:26 szonett kernel: [14475.777381] rcu_process_callbacks(): 17 invocations in 11 seconds
Jul 18 00:16:26 szonett kernel: [14490.786453] rcu_process_callbacks(): 3 invocations in 15 seconds
Jul 18 00:16:26 szonett kernel: [14501.856736] rcu_process_callbacks(): 2 invocations in 11 seconds
Jul 18 00:16:26 szonett kernel: [14512.768836] rcu_process_callbacks(): 3 invocations in 10 seconds
Jul 18 00:16:33 szonett kernel: [14524.000400] rcu_process_callbacks(): 2 invocations in 11 seconds

I slightly modified your diagnostic patch to print the number of callbacks and the length of the actual interval instead of the callback rate. This shows how the timing gets erratic and the callback rate diminishes when the load jumps to 20. The same can be seen on the trace output, for example if you plot it in Gnuplot with a command like:

plot "<zcat ./rcu_trace_knee.txt.gz | cut -c 29-41" u 0:1 w d

Unfortunately I can't precisely match the trace timestamps with the kernel timestamps in the syslog (what's their relation?), but I thinks its something like the above, ie. both syslog and the trace output include the breakdown point. I'll try enabling some WiFi driver traces as well, but that's really pure heuristics, I can't really interpret this output.
I'd much welcome further debugging ideas.

Note You need to log in before you can comment on or make changes to this bug.