Bug 15328

Summary: high load avg, extreme sluggishness on T41 w/ Radeon Mobility M7
Product: Drivers Reporter: John W. Linville (linville)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: CLOSED CODE_FIX    
Severity: normal CC: bjaglin, chantry.xavier, currojerez, glisse, jb.faq, liquid.acid, maciej.rutecki, rjw, suokkos, thellstrom
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33-rc8 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14885    
Attachments: ttm_set_caching_nopat.patch
ttm_set_caching_nopat_2.patch

Description John W. Linville 2010-02-16 20:25:40 UTC
After loading a recent wireless-testing kernel (based on 2.6.33-rc8) on my T41 I noticed an unacceptabel sluggishness.  Opening gnome-terminal took a long time, firefox took so long I rebooted before it ever came-up, etc.  The w command continually reported a load average for between 2 and 3 on an apparently idle box.

A git bisect session identified this patch:

commit db78e27de7e29a6db6be7caf607cf803d84094aa
Author: Francisco Jerez <currojerez@riseup.net>
Date:   Tue Jan 12 18:49:43 2010 +0100

    drm/ttm: Avoid conflicting reserve_memtype during ttm_tt_set_page_caching.

    Fixes errors like:
    > reserve_ram_pages_type failed 0x15b7a000-0x15b7b000, track 0x8, req 0x10
    when a BO is moved between WC and UC areas.

    Reported-by: Xavier Chantry <shiningxc@gmail.com>
    Signed-off-by: Francisco Jerez <currojerez@riseup.net>
    Acked-by: Thomas Hellstrom <thellstrom@vmware.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>

Just to verify, I sent back to my wireless-testing tree and reverted that patch.  This restored normal/acceptable performance.

If you need/want testing or whatnot just let me know...
Comment 1 Xavier Chantry 2010-02-16 20:36:49 UTC
It was also reported here :
http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg47164.html

I asked curro about it but he had no idea :
17:13 < shining> curro__: did you see "Problem with drm-radeon-testing commit - drm/ttm: Avoid conflicting reserve_memtype during ttm_tt_set_page_caching." ?
17:20 < curro__> shining: not sure about that

It seems to be an issue only for few radeon users.
Comment 2 John W. Linville 2010-02-16 20:43:30 UTC
Definitely the same commit, and sounds more-or-less like the same problem.  Although in the report above it doesn't sound that terrible whereas on this T41 (single 1.6GHz i686) it cripples the box.
Comment 3 Thomas Hellstrom 2010-02-16 21:24:27 UTC
Really, The Radeon driver *should* never change caching policy from WC to UC, and thus never hit the code path affected by that patch. However if it does, and particularly if it does frequently it's quite natural that the patch causes severe CPU overhead since transition from WC->WB->UC causes a cache flush whereas WC->UC doesn't.

Maybe Jerome has some input?
Comment 4 John W. Linville 2010-02-17 22:06:22 UTC
I have no idea at to how serious a problem the commit fixed.  But I know it is causing a serious problem on my T41.  Maybe I'm wrong, but I don't get the feeling that the patch author is too interested in fixing the problem his patch created.  Since we are late in the 2.6.33 cycle, I would prefer to see this reverted under the "we don't exchange one bug for another" policy.
Comment 5 Pauli 2010-02-17 22:25:34 UTC
(In reply to comment #4)
> I have no idea at to how serious a problem the commit fixed.  But I know it
> is
> causing a serious problem on my T41.  Maybe I'm wrong, but I don't get the
> feeling that the patch author is too interested in fixing the problem his
> patch
> created.  Since we are late in the 2.6.33 cycle, I would prefer to see this
> reverted under the "we don't exchange one bug for another" policy.

Does your kernel have PAT enabled?
Comment 6 bjaglin 2010-02-18 01:59:51 UTC
Same problem here on a T40 (Radeon 9000 - RV250), extremely sluggish for >=2.6.33-rc7. Reverting the commit mentioned above seems to fix the problem.

If that's what you are asking, cat /proc/cpuinfo does not show "pat" as a flag on my Pentium M (SL7EG).
Comment 7 Jérôme Glisse 2010-02-18 15:44:56 UTC
Yes, radeon is wrong, i am tracking down why we are doing WC -> UC or UC -> WC
Comment 8 Jérôme Glisse 2010-02-18 17:02:45 UTC
Ok so it's more likely because on non PAT get_page_memtype always return -1 thus we always do set_wb and if page were already wb it likely kill perf. I don't think we ever do WC -> UC or UC -> WC.
Comment 9 Francisco Jerez 2010-02-18 18:53:42 UTC
Created attachment 25107 [details]
ttm_set_caching_nopat.patch

I guess the attached patch fixes it. However it might make sense to fix the PAT code instead because this interface is somewhat annoying.
Comment 10 Francisco Jerez 2010-02-18 19:14:55 UTC
BTW, sorry for not taking care of this before.
Comment 11 John W. Linville 2010-02-19 14:23:30 UTC
Sorry for the delay...

CONFIG_X86_PAT=y

Building w/ patch from comment 9 now.
Comment 12 John W. Linville 2010-02-19 15:00:57 UTC
Yup, that looks good...thanks!
Comment 13 Francisco Jerez 2010-02-19 16:48:47 UTC
Created attachment 25118 [details]
ttm_set_caching_nopat_2.patch

This patch uses a simpler approach, would you mind checking that it has the same effect?
Comment 14 John W. Linville 2010-02-19 18:47:23 UTC
The patch from comment 13 is acceptable, but my "seat of the pants" feeling is that the first patch performed better.  FWIW the load average seemed 20-30% lower with the first patch too, but I didn't do exhaustive testing by any means.
Comment 15 Tobias Jakobi 2010-02-19 22:35:26 UTC
I'm experiencing similar problems on my RV740 which is hooked up through the PCIe.

I first noticed the issue when updating drm-radeon-testing and mesa+xf86-video-ati+libdrm to git master.

I'm now using the latest drm-linus branch, which should fix this problem - but apparantly not for me. Maybe it's just similar...

Anyway, I now have a average performance of 4fps in CS 1.5. The funny thing is that quake3 is running with around 300fps when not limiting the fps. However that's the only application I have currently installed that does that. Every other OpenGL game I tested slows down to a crawl.

I also tried both patches here on drm-radeon-testing prior to checking out drm-linus. Applying them there makes no difference to the performance.
Comment 16 Francisco Jerez 2010-02-19 22:49:18 UTC
(In reply to comment #15)
> I'm experiencing similar problems on my RV740 which is hooked up through the
> PCIe.
> 
> I first noticed the issue when updating drm-radeon-testing and
> mesa+xf86-video-ati+libdrm to git master.
> 
> I'm now using the latest drm-linus branch, which should fix this problem -
> but
> apparantly not for me. Maybe it's just similar...
> 
> Anyway, I now have a average performance of 4fps in CS 1.5. The funny thing
> is
> that quake3 is running with around 300fps when not limiting the fps. However
> that's the only application I have currently installed that does that. Every
> other OpenGL game I tested slows down to a crawl.
> 
> I also tried both patches here on drm-radeon-testing prior to checking out
> drm-linus. Applying them there makes no difference to the performance.

Have you tried reverting db78e27de7e2?
Comment 17 Tobias Jakobi 2010-02-19 23:36:50 UTC
I get an error with git when trying to revert the commit, so I just tried the two patches.
Comment 18 Francisco Jerez 2010-02-19 23:44:25 UTC
(In reply to comment #17)
> I get an error with git when trying to revert the commit, so I just tried the
> two patches.

You'll need to revert f0e2f38befa7 first.
Comment 19 Tobias Jakobi 2010-02-19 23:49:06 UTC
I just went back to commit db78e27de7e2 and reverted then - I'm now building that tree. Reporting back once I've done some tests.
Comment 20 Tobias Jakobi 2010-02-20 00:03:17 UTC
No, nothing - doesn't have any effect on performance. Must be something else.
I'm still wondering though since I already went back several weeks in git with xf86-video-ati, libdrm and mesa and the problem still occurs... have to think about what I updated in the past. :(
Comment 21 Rafael J. Wysocki 2010-02-22 21:42:00 UTC
Patch : http://bugzilla.kernel.org/attachment.cgi?id=25118
Handled-By : Francisco Jerez <currojerez@riseup.net>
Comment 22 Rafael J. Wysocki 2010-03-21 19:20:13 UTC
*** Bug 15306 has been marked as a duplicate of this bug. ***
Comment 23 Tobias Jakobi 2010-03-22 16:01:14 UTC
I totally forgot about this one:

To summarize what _my_ problem was...
I found out that the slowdowns were only affecting 32bit application (which includes everything pushed through wine) and also ut2003, which I used for testing. ioquake3 was compiled for 64bit (and it wasn't affected).
I probably never had any direct HW-accelerated rendering for 32bit applications in the first place, but I didn't notice this because the indirect rendering (which was selected by default, don't ask me why) wasn't that slow. Now it was either an update to mesa or the mesa gentoo ebuild that "fixed" a problem, so that NOW the sw rasterizer was now loaded by default (I looked like that loading of the 32bit swrast failed in the past, resulting in indirect GLX). Of course this resulted in awfully slow rendering.

After creating custom libdrm and mesa libs in /usr/local/lib32/ and also doing a check with 32bit glxinfo the whole rendering is now HW-accelerated, slowdowns are gone for good :)