Bug 17732 - [2.6.35.x regression] rcu_preempt_state stall warning and machine slow-downs
Summary: [2.6.35.x regression] rcu_preempt_state stall warning and machine slow-downs
Status: CLOSED CODE_FIX
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks: 16055
  Show dependency tree
 
Reported: 2010-09-03 18:48 UTC by Maciej Rutecki
Modified: 2010-09-26 20:29 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.35
Tree: Mainline
Regression: Yes


Attachments
kernel stall warning (44.75 KB, text/plain)
2010-09-06 12:12 UTC, Matthias Dahl
Details
anoter rcu stall warning (3.31 KB, application/x-gzip)
2010-09-08 07:07 UTC, Matthias Dahl
Details
rcu stall w/ nasty slowdown (3.55 KB, application/x-gzip)
2010-09-14 07:54 UTC, Matthias Dahl
Details
Fix by Thomas Gleixner (3.97 KB, patch)
2010-09-15 17:14 UTC, Martin Kepplinger
Details | Diff

Description Maciej Rutecki 2010-09-03 18:48:29 UTC
Subject    : [2.6.35.x regression] rcu_preempt_state stall warning and machine slow-downs
Submitter  : Matthias Dahl <ml_kernel@mortal-soul.de>
Date       : 2010-09-01 6:47
Message-ID : 201009010847.20236.ml_kernel@mortal-soul.de
References : http://marc.info/?l=linux-kernel&m=128332418629305&w=2

This entry is being used for tracking a regression from 2.6.34. Please don't
close it until the problem is fixed in the mainline.
Comment 1 Matthias Dahl 2010-09-06 12:12:31 UTC
Created attachment 29122 [details]
kernel stall warning

This just happened to me again while shutting down. It took several minutes for the shutdown to complete for no apparent reason.
Comment 2 Matthias Dahl 2010-09-08 07:07:06 UTC
Created attachment 29242 [details]
anoter rcu stall warning

Sorry for posting again but this bug is really nerve wrecking and my knowledge of the kernel internals is not as complete as I'd like to trace this myself.

Attached is another instance of a rcu stall warning. This almost exclusively happens while the machine is under load. All kernels prior 2.6.35 are fine. What pops up in all backtraces is one core is in delay_tsc(). I don't know if this is relevant.

Yesterday I got the following effect which falls under the slowdown category: Watching a video w/ vlc, the video and audio started stuttering and always mostly recovered while I was moving the mouse around. As soon as I stopped, it was back to stuttering. This has happened before too exclusively w/ 2.6.35.x. :-(
Comment 3 Matthias Dahl 2010-09-08 07:09:35 UTC
Forgot the following important bit: Restarting X or anything alike did not help, I had to restart the machine which took longer than usual to restart. There were no processes stuck or consuming any relevant amount of CPU time.
Comment 4 Matthias Dahl 2010-09-14 07:54:19 UTC
Created attachment 29922 [details]
rcu stall w/ nasty slowdown

This is a backtrace from a rcu stall warning that happened w/ a nasty slowdown.
Comment 5 Martin Kepplinger 2010-09-15 11:27:29 UTC
I have the exact same problem, you described it very well. I use a Lenovo G550 Laptop.

/var/log/messages shows 

Sep 15 12:15:51 mobil pulseaudio[1456]: ratelimit.c: 19 events suppressed
Sep 15 12:15:57 mobil pulseaudio[1456]: ratelimit.c: 25 events suppressed
Sep 15 12:16:02 mobil pulseaudio[1456]: ratelimit.c: 6 events suppressed

while video-playback. These warning stop when I stop the video. The system however stays unusably slow / "stucky". Reboot takes too long at that point, so I mostly interrupt it.

Could you tell me, where I can check for these rcu stall warnings as well?
thanks!
Comment 6 Martin Kepplinger 2010-09-15 17:14:07 UTC
Created attachment 30142 [details]
Fix by Thomas Gleixner

This fixes the problem for me on 2.6.35. I'll test it against a current tree aswell. Author: Thomas Gleixner
Comment 7 Thomas Gleixner 2010-09-21 10:25:18 UTC
> --- Comment #6 from Martin Kepplinger <martinkepplinger@eml.cc>  2010-09-15
> 17:14:07 ---
> Created an attachment (id=30142)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=30142)
> Fix by Thomas Gleixner
> 
> This fixes the problem for me on 2.6.35. I'll test it against a current tree
> aswell. Author: Thomas Gleixner

That fix is in 2.6.35.5 now. Can the other reporters please re-test ?

Thanks,

	tglx
Comment 8 Martin Kepplinger 2010-09-21 15:54:58 UTC
I'm running 2.6.35.5 without problems. I can really call it stable (and actually use it) now.

Thanks for the effort,

martin
Comment 9 Matthias Dahl 2010-09-22 05:11:15 UTC
Hi.

Sorry for my late response. I've been running 2.6.35.4 w/ both patches posted to the kernel list (by Thomas Gleixner) for several days now and have seen no more rcu stall warnings or slowdowns.

Thanks _a lot_ for the fix.

So long,
matthias.

Note You need to log in before you can comment on or make changes to this bug.