Bug 17732

Summary: [2.6.35.x regression] rcu_preempt_state stall warning and machine slow-downs
Product: Platform Specific/Hardware Reporter: Maciej Rutecki (maciej.rutecki)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: CLOSED CODE_FIX    
Severity: normal CC: maciej.rutecki, martink, oskarw85.spam, rjw, ua_bugzilla_linux-kernel
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35 Tree: Mainline
Regression: Yes
Bug Depends on:    
Bug Blocks: 16055    
Attachments: kernel stall warning
anoter rcu stall warning
rcu stall w/ nasty slowdown
Fix by Thomas Gleixner

Description Maciej Rutecki 2010-09-03 18:48:29 UTC
Subject    : [2.6.35.x regression] rcu_preempt_state stall warning and machine slow-downs
Submitter  : Matthias Dahl <ml_kernel@mortal-soul.de>
Date       : 2010-09-01 6:47
Message-ID : 201009010847.20236.ml_kernel@mortal-soul.de
References : http://marc.info/?l=linux-kernel&m=128332418629305&w=2

This entry is being used for tracking a regression from 2.6.34. Please don't
close it until the problem is fixed in the mainline.
Comment 1 Matthias Dahl 2010-09-06 12:12:31 UTC
Created attachment 29122 [details]
kernel stall warning

This just happened to me again while shutting down. It took several minutes for the shutdown to complete for no apparent reason.
Comment 2 Matthias Dahl 2010-09-08 07:07:06 UTC
Created attachment 29242 [details]
anoter rcu stall warning

Sorry for posting again but this bug is really nerve wrecking and my knowledge of the kernel internals is not as complete as I'd like to trace this myself.

Attached is another instance of a rcu stall warning. This almost exclusively happens while the machine is under load. All kernels prior 2.6.35 are fine. What pops up in all backtraces is one core is in delay_tsc(). I don't know if this is relevant.

Yesterday I got the following effect which falls under the slowdown category: Watching a video w/ vlc, the video and audio started stuttering and always mostly recovered while I was moving the mouse around. As soon as I stopped, it was back to stuttering. This has happened before too exclusively w/ 2.6.35.x. :-(
Comment 3 Matthias Dahl 2010-09-08 07:09:35 UTC
Forgot the following important bit: Restarting X or anything alike did not help, I had to restart the machine which took longer than usual to restart. There were no processes stuck or consuming any relevant amount of CPU time.
Comment 4 Matthias Dahl 2010-09-14 07:54:19 UTC
Created attachment 29922 [details]
rcu stall w/ nasty slowdown

This is a backtrace from a rcu stall warning that happened w/ a nasty slowdown.
Comment 5 Martin Kepplinger 2010-09-15 11:27:29 UTC
I have the exact same problem, you described it very well. I use a Lenovo G550 Laptop.

/var/log/messages shows 

Sep 15 12:15:51 mobil pulseaudio[1456]: ratelimit.c: 19 events suppressed
Sep 15 12:15:57 mobil pulseaudio[1456]: ratelimit.c: 25 events suppressed
Sep 15 12:16:02 mobil pulseaudio[1456]: ratelimit.c: 6 events suppressed

while video-playback. These warning stop when I stop the video. The system however stays unusably slow / "stucky". Reboot takes too long at that point, so I mostly interrupt it.

Could you tell me, where I can check for these rcu stall warnings as well?
thanks!
Comment 6 Martin Kepplinger 2010-09-15 17:14:07 UTC
Created attachment 30142 [details]
Fix by Thomas Gleixner

This fixes the problem for me on 2.6.35. I'll test it against a current tree aswell. Author: Thomas Gleixner
Comment 7 Thomas Gleixner 2010-09-21 10:25:18 UTC
> --- Comment #6 from Martin Kepplinger <martinkepplinger@eml.cc>  2010-09-15
> 17:14:07 ---
> Created an attachment (id=30142)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=30142)
> Fix by Thomas Gleixner
> 
> This fixes the problem for me on 2.6.35. I'll test it against a current tree
> aswell. Author: Thomas Gleixner

That fix is in 2.6.35.5 now. Can the other reporters please re-test ?

Thanks,

	tglx
Comment 8 Martin Kepplinger 2010-09-21 15:54:58 UTC
I'm running 2.6.35.5 without problems. I can really call it stable (and actually use it) now.

Thanks for the effort,

martin
Comment 9 Matthias Dahl 2010-09-22 05:11:15 UTC
Hi.

Sorry for my late response. I've been running 2.6.35.4 w/ both patches posted to the kernel list (by Thomas Gleixner) for several days now and have seen no more rcu stall warnings or slowdowns.

Thanks _a lot_ for the fix.

So long,
matthias.