Bug 16265 - Why is kslowd accumulating so much CPU time?
Summary: Why is kslowd accumulating so much CPU time?
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
: 18802 (view as bug list)
Depends on:
Blocks: 16055
  Show dependency tree
 
Reported: 2010-06-21 19:04 UTC by Rafael J. Wysocki
Modified: 2011-04-09 17:45 UTC (History)
9 users (show)

See Also:
Kernel Version: 2.6.35-rc2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
kslowd in 2.6.37 (69.51 KB, image/png)
2011-01-18 18:19 UTC, Bc. Slavomir Danas
Details
kworker polling hogs CPU on 2.6.38 (81.01 KB, image/png)
2011-04-09 16:51 UTC, Bc. Slavomir Danas
Details

Description Rafael J. Wysocki 2010-06-21 19:04:09 UTC
Subject    : Why is kslowd accumulating so much CPU time?
Submitter  : "Theodore Ts'o" <tytso@mit.edu>
Date       : 2010-06-09 18:36
Message-ID : E1OMQ88-0002a1-Gb@closure.thunk.org
References : http://marc.info/?l=linux-kernel&m=127610857819033&w=4

This entry is being used for tracking a regression from 2.6..  Please don't
close it until the problem is fixed in the mainline.

Caused by:

commit fbf81762e385d3d45acad057b654d56972acf58c
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Jun 1 09:09:06 2010 +1000

    drm/kms: disable/enable poll around switcheroo on/off
    
    Because we aren't in a suspend state the poll will still run when we have switcherooed a card off.
    
    Signed-off-by: Dave Airlie <airlied@redhat.com>

First-Bad-Commit : fbf81762e385d3d45acad057b654d56972acf58c
Comment 1 Rafael J. Wysocki 2010-07-23 20:03:18 UTC
On Friday, July 23, 2010, Nick Bowler wrote:
> On 13:47 Fri 23 Jul     , Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a summary report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.34.  Please verify if it still should be listed and let the
> tracking team
> > know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=16265
> > Subject             : Why is kslowd accumulating so much CPU time?
> > Submitter   : Theodore Ts'o <tytso@mit.edu>
> > Date                : 2010-06-09 18:36 (45 days old)
> 
> I actually haven't been able to reproduce with kernels since a couple
> weeks ago (still reproducible if I go back to 2.6.35-rc3, though).
> Seems to be fixed somehow, can anyone else confirm?
Comment 2 Thomas Lindroth 2010-07-24 13:19:51 UTC
I'm still experiencing what I believe to be this bug on 2.6.35-rc6. The mouse stalls every few seconds and the kslowd threads use a lot of cpu when the mouse stalls. The problem started after upgrading from 2.6.34.

I'm using an intel 965GM connected with HDMI.
Comment 3 alfonso.pola 2010-08-12 00:15:51 UTC
I'm having this problem with kernel 2.6.35.1 in archlinux. I'm willing to provide debugging information, I just don't know what should I post. I have a lenovo sl300 laptop with intel integrated graphics (HD4500 I believe).
Comment 4 Chris Wilson 2010-08-13 08:13:45 UTC
Ok, we understand what's going on here now. Every 10s the non-hotplug capable outputs are polled for connection/disconnection events. For a certain class of analog device on some hardware this is very slow and CPU intensive.

See also https://bugs.freedesktop.org/show_bug?id=29536

The first workaround proposed is just to disable polling via a module parameter. The eventual solution will likely be a mix of finer grained locking and cheaper, non-destructive polling.
Comment 5 alfonso.pola 2010-08-13 19:51:15 UTC
ok, cool.

PS: The link is http://bugs.freedesktop.org/show_bug.cgi?id=29536
Comment 6 Florian Mickler 2010-09-07 06:37:49 UTC
Handled-By: Chris Wilson <chris@chris-wilson.co.uk>
References: http://bugs.freedesktop.org/show_bug.cgi?id=29536
Comment 7 Takehiko Abe 2010-09-22 11:58:39 UTC
The "automatic workaround" does not work for me. I needed to set the
module parameter to "disable polling". There is a similar bug
reported:

Bug 18802 kworker: high CPU usage -> system sluggish
https://bugzilla.kernel.org/show_bug.cgi?id=18802
Comment 8 Florian Mickler 2011-01-12 10:48:30 UTC
*** Bug 18802 has been marked as a duplicate of this bug. ***
Comment 9 Florian Mickler 2011-01-12 11:19:26 UTC
Is this issue still visible on 2.6.37?
Comment 10 Florian Mickler 2011-01-12 11:37:46 UTC
I'm closing this for now, please reopen or shout if this issue is still unresolved for you. 

This should have been fixed for intel since 2.6.36-rc5: 

commit 930a9e283516a3a3595c0c515113f1b78d07f695
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Sep 14 11:07:23 2010 +0100

    drm: Use a nondestructive mode for output detect when polling (v2)


for nouveau it's probably fixed since v2.6.37-rc3:

commit 01db363979e96115a895f35c823303660f0f328d
Author: Francisco Jerez <currojerez@riseup.net>
Date:   Thu Oct 21 17:43:08 2010 +0200

    drm/nouveau: Use "force" to decide if analog load detection is ok or not.



and for radeon since v2.6.37-rc1:

commit c3cceeddf0b5f97b0d2352b98ef0f025e31a9ae3
Author: Dave Airlie <airlied@redhat.com>
Date:   Tue Oct 26 12:55:52 2010 +1000

    drm/radeon/kms: don't poll dac load detect.
Comment 11 Bc. Slavomir Danas 2011-01-15 20:42:49 UTC
This issue is still present for me with Intel 4500 MHD (intel driver 2.14.0, xorg-server 1.9.3.901, mesa 7.9.1, libdrm 2.4.23) with gentoo-sources 2.6.37. I've read this announcement and disabled my drm_kms_helper.poll=N kernel boot param but the issue has come up after few minutes after booting. System was sluggish and kworker process appeared eating 60% CPU.
Comment 12 Chris Wilson 2011-01-15 22:35:48 UTC
(In reply to comment #11)
> This issue is still present for me with Intel 4500 MHD (intel driver 2.14.0,
> xorg-server 1.9.3.901, mesa 7.9.1, libdrm 2.4.23) with gentoo-sources 2.6.37.
> I've read this announcement and disabled my drm_kms_helper.poll=N kernel boot
> param but the issue has come up after few minutes after booting. System was
> sluggish and kworker process appeared eating 60% CPU.

Different culprit - you neither have the affected hardware nor the responsible hotplug polling enabled.
Comment 13 Bc. Slavomir Danas 2011-01-16 09:04:37 UTC
I have been redirected here from https://bugzilla.kernel.org/show_bug.cgi?id=18802 so it is probably not a duplicate of this bug because the workaround there did work for me.
Comment 14 Florian Mickler 2011-01-16 12:46:57 UTC
You write in comment #11 about 'disabling the boot drm_kms_helper.poll=N'... 

Did you mean with that, that the problem is in 2.6.37 (final) still present and only solved by putting drm_kms_helper.poll=N on the commandline... 


or 

Even with polling disabled via the commandline drm_kms_helper.poll=N you experience the symptoms?


Regards,
Flo
Comment 15 Bc. Slavomir Danas 2011-01-16 19:45:49 UTC
I've had this drm_kms_helper.poll=N kernel workaround in my grub since I've discovered it in 18802 bug (around 2.6.36-r2 i think) because I've suffered by the udev drm polling storm in 2.6.34, kslowd in 2.6.35 and kworker in 2.6.36. Since I've applied it everything was working smooth. I read the announcement about bug 18802 being duplicate of this bug and it being resolved as of 2.6.36-rc5 so I have removed this kernel setting and rebooted with my 2.6.37 kernel hoping that everything should be fine. But in few minutes the same symptoms appeared so I reverted back to drm_kms_helper.poll=N kernel option and everything seems all right again (with the workaround). I can provide more info on my HW or SW by request.
Comment 16 Chris Wilson 2011-01-16 22:46:41 UTC
You need to quantify your observations here. When polling is enabled, then every 10s the worker will be awoken and consume CPU time checking that all the displays are the same as before. Unless on your machine it takes a ridiculous amount of CPU time (say greater than 5%), it is just the cost of the ability for your computer to automatically detect display changes.

The real bug is the latency it induces... But I digress.
Comment 17 Bc. Slavomir Danas 2011-01-18 18:18:36 UTC
It doesn't seem to me as sane to consume this amount of CPU time to poll for anything (see attached screenshot). I figured a way how to reproduce this, I've enabled RandR polling in KDE, attached LCD TV via HDMI and played a video in VLC. After few mouse clicks in the menu of VLC I've witnessed jerky mouse movement and later my X has frozen completely. I've switched to console and found out that when I unpluged HDMI the kworker process went quiet.
I was able to take one screenshot but as my X froze I saw 3 kworker processes running, eating all CPU time rendering my system useless (no chance to take screenshot at that moment).
I repeat I'm running latest gentoo-sources-2.6.37.
Comment 18 Bc. Slavomir Danas 2011-01-18 18:19:30 UTC
Created attachment 44072 [details]
kslowd in 2.6.37
Comment 19 Bc. Slavomir Danas 2011-04-09 16:48:51 UTC
You should definetely reopen this as I still have issues on 2.6.38. Will provide anything to help you resolve it. See following attachement...
Comment 20 Bc. Slavomir Danas 2011-04-09 16:51:37 UTC
Created attachment 53912 [details]
kworker polling hogs CPU on 2.6.38

kworker polling hogs CPU on 2.6.38
Gentoo x64
gentoo-sources 2.6.38
HP ProBook 4510s

00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller [8086:2a42] (rev 07) (prog-if 00 [VGA controller])
        Subsystem: Hewlett-Packard Company Device [103c:3072]
        Flags: bus master, fast devsel, latency 0, IRQ 44
        Memory at d0000000 (64-bit, non-prefetchable) [size=4M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at 50f0 [size=8]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 3
        Kernel driver in use: i915

Attached external LCD TV at 720p via HDMI cable
Comment 21 Chris Wilson 2011-04-09 17:01:53 UTC
Please do a "perf top" to see what tasklet is actually being run by kslowd.
Comment 22 Bc. Slavomir Danas 2011-04-09 17:45:57 UTC
This is while kslowd consumed approx. 75% CPU:


   PerfTop:       0 irqs/sec  kernel:-nan%  exact: -nan% [1000Hz cycles],  (all, 2 CPUs)

             3215.00 86.8% delay_tsc                     /lib/modules/2.6.38-ck-BKL/build/vmlinux
              163.00  4.4% read_hpet                     /lib/modules/2.6.38-ck-BKL/build/vmlinux
              110.00  3.0% set_clock                     /lib/modules/2.6.38-ck-BKL/build/vmlinux
               55.00  1.5% get_clock                     /lib/modules/2.6.38-ck-BKL/build/vmlinux
               45.00  1.2% get_data                      /lib/modules/2.6.38-ck-BKL/build/vmlinux
               28.00  0.8% rb_next                       /lib/modules/2.6.38-ck-BKL/build/vmlinux
                7.00  0.2% _raw_spin_lock_irqsave        /lib/modules/2.6.38-ck-BKL/build/vmlinux
                7.00  0.2% hpet_legacy_next_event        /lib/modules/2.6.38-ck-BKL/build/vmlinux
                7.00  0.2% set_data                      /lib/modules/2.6.38-ck-BKL/build/vmlinux
                6.00  0.2% __pthread_mutex_lock_internal /lib64/libpthread-2.11.3.so             
                5.00  0.1% acpi_os_read_port             /lib/modules/2.6.38-ck-BKL/build/vmlinux

Note You need to log in before you can comment on or make changes to this bug.