Bug 18802

Summary: kworker: high CPU usage -> system sluggish
Product: Process Management Reporter: Török Edwin (edwin+bugs)
Component: SchedulerAssignee: Ingo Molnar (mingo)
Status: CLOSED DUPLICATE    
Severity: normal CC: florian, keke, lenb, slavomir.danas
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36-rc4 Subsystem:
Regression: No Bisected commit-id:
Attachments: cfs debug info logs (includes .config)

Description Török Edwin 2010-09-19 21:24:36 UTC
Created attachment 30632 [details]
cfs debug info logs (includes .config)

I noticed that the system becomes sluggish sometimes, top shows me a bunch of 'kworker' processes at ~40-60% CPU usage.
"sluggish" means for me high latency, i.e. slow at moving mouse in X, switching processes, etc.

This happens for example if I do 'winetricks ie6', or 'winetricks ie7' just as it is about to finish installing them. I'll try to find out exactly what that script (or wine) does that triggers this issue.

I launched 'sudo perf top' and it has shown me this:

  4953.00 62.9% delay_tsc                             [kernel]

Usually delay_tsc is at 10%, and there is no noticable latency then (although why should delay_tsc spend so much time? Does it have to do something with audio? I have mplayer running all the time...)

Attaching my .config

Here is a snapshot of top when this happens:
 4412 root      20   0  182m  34m 4260 D   23  0.9  96:53.35 Xorg                                                         
  683 root      20   0     0    0    0 S   17  0.0   0:02.79 kworker/0:0                                                  
 9634 root      20   0     0    0    0 R   16  0.0   3:29.24 kworker/1:2                                                  
 5572 edwin     20   0  830m 187m 9016 S   14  4.7  14:18.05 claws-mail                                                   
  676 root      20   0     0    0    0 S   11  0.0   0:01.24 kworker/2:2                                                  
  687 root      20   0     0    0    0 S    6  0.0   1:21.54 kworker/3:1  

I did a 'perf sched record', but 'perf sched replay doesn't reproduce the issue for me (also it uses 1 CPU 100% while original case was several CPUs used each one a bit).

I've run the cfs-debug-info script too, will attach its output.
Comment 1 Takehiko Abe 2010-09-22 11:46:09 UTC
The symptom looks similar to:

  Bug 16265 -  Why is kslowd accumulating so much CPU time?
  https://bugzilla.kernel.org/show_bug.cgi?id=16265

kslowd was replaced with kworker sometime ago:

| commit 991ea75cb1df7188d209274b3d51c105b4f18ffe
| Author: Tejun Heo <tj@kernel.org>
| Date:   Tue Jul 20 22:09:02 2010 +0200
| 
|     drm: use workqueue instead of slow-work

There are two patches committed for the bug #16265.
One provides a module parameter to "disable polling".
Another is for "automatic workaround".

Obviously the "automatic workaround" is not working for us.
But setting the module parameter worked for me.

  drm/kms: Add a module parameter to disable polling
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e58f637bb96d5a0ae0919b9998b891d1ba7e47c9

Try booting with drm_kms_helper.poll=N or setting the param at
/sys/module/drm_kms_helper/parameters/poll . It might work for you
too.
Comment 2 Török Edwin 2010-09-29 08:51:14 UTC
(In reply to comment #1)
> Try booting with drm_kms_helper.poll=N or setting the param at
> /sys/module/drm_kms_helper/parameters/poll . It might work for you
> too.

Thanks, but I'm having trouble reproducing this on a fresh boot.
kworker shows up for only a second or so in top, and then it is gone.

Looking at the dmesg I attached to this bugreport I see this:
[24541.100953] [drm:radeon_dvi_detect] *ERROR* DVI-I-1: probed a monitor but no|invalid EDID

I don't have that message now, I don't know under what circumstances that DVI-I-1 probing occured, I tried 'xset dpms force off', waiting 30s, or turning monitor off from the button. Still that message didn't show up.

I guess the kworker high CPU usage occurs only after it failed to probe the DVI-I-1 connector.

Regarding the 'automatic workaround', if you are refering to 7b334fcb45b757ffb093696ca3de1b0c8b4a33f1, that only implements it for intel cards, so obviously it won't work on radeon cards!
Comment 3 Takehiko Abe 2010-10-04 07:33:43 UTC
> Regarding the 'automatic workaround', if you are refering to
> 7b334fcb45b757ffb093696ca3de1b0c8b4a33f1, that only implements it
> for intel cards, so obviously it won't work on radeon cards!

Yes, that's the one. And I see now that it doesn't do anything for
radeon.  I have an intel graphics system and the patch does not work
for it either.

But I must say that my problem is more benign than the symptoms
reported by you and #16265 -- e.g. mouse stalls. I noticed the problem
through audio playback glitches.
Comment 4 Bc. Slavomir Danas 2010-11-27 19:05:47 UTC
(In reply to comment #1)
> Try booting with drm_kms_helper.poll=N or setting the param at
> /sys/module/drm_kms_helper/parameters/poll . It might work for you
> too.

Awesome! I was looking for a workaround for ages (since 2.6.34 where I suffered by udev drm polling storm which changed to kslowd cpu hogging in 2.6.35 and the same problem with kworker in 2.6.36)

My system is snappy again! Thanks!
Comment 5 Florian Mickler 2011-01-12 10:48:30 UTC

*** This bug has been marked as a duplicate of bug 16265 ***