After loading a recent wireless-testing kernel (based on 2.6.33-rc8) on my T41 I noticed an unacceptabel sluggishness. Opening gnome-terminal took a long time, firefox took so long I rebooted before it ever came-up, etc. The w command continually reported a load average for between 2 and 3 on an apparently idle box. A git bisect session identified this patch: commit db78e27de7e29a6db6be7caf607cf803d84094aa Author: Francisco Jerez <currojerez@riseup.net> Date: Tue Jan 12 18:49:43 2010 +0100 drm/ttm: Avoid conflicting reserve_memtype during ttm_tt_set_page_caching. Fixes errors like: > reserve_ram_pages_type failed 0x15b7a000-0x15b7b000, track 0x8, req 0x10 when a BO is moved between WC and UC areas. Reported-by: Xavier Chantry <shiningxc@gmail.com> Signed-off-by: Francisco Jerez <currojerez@riseup.net> Acked-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Just to verify, I sent back to my wireless-testing tree and reverted that patch. This restored normal/acceptable performance. If you need/want testing or whatnot just let me know...
It was also reported here : http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg47164.html I asked curro about it but he had no idea : 17:13 < shining> curro__: did you see "Problem with drm-radeon-testing commit - drm/ttm: Avoid conflicting reserve_memtype during ttm_tt_set_page_caching." ? 17:20 < curro__> shining: not sure about that It seems to be an issue only for few radeon users.
Definitely the same commit, and sounds more-or-less like the same problem. Although in the report above it doesn't sound that terrible whereas on this T41 (single 1.6GHz i686) it cripples the box.
Really, The Radeon driver *should* never change caching policy from WC to UC, and thus never hit the code path affected by that patch. However if it does, and particularly if it does frequently it's quite natural that the patch causes severe CPU overhead since transition from WC->WB->UC causes a cache flush whereas WC->UC doesn't. Maybe Jerome has some input?
I have no idea at to how serious a problem the commit fixed. But I know it is causing a serious problem on my T41. Maybe I'm wrong, but I don't get the feeling that the patch author is too interested in fixing the problem his patch created. Since we are late in the 2.6.33 cycle, I would prefer to see this reverted under the "we don't exchange one bug for another" policy.
(In reply to comment #4) > I have no idea at to how serious a problem the commit fixed. But I know it > is > causing a serious problem on my T41. Maybe I'm wrong, but I don't get the > feeling that the patch author is too interested in fixing the problem his > patch > created. Since we are late in the 2.6.33 cycle, I would prefer to see this > reverted under the "we don't exchange one bug for another" policy. Does your kernel have PAT enabled?
Same problem here on a T40 (Radeon 9000 - RV250), extremely sluggish for >=2.6.33-rc7. Reverting the commit mentioned above seems to fix the problem. If that's what you are asking, cat /proc/cpuinfo does not show "pat" as a flag on my Pentium M (SL7EG).
Yes, radeon is wrong, i am tracking down why we are doing WC -> UC or UC -> WC
Ok so it's more likely because on non PAT get_page_memtype always return -1 thus we always do set_wb and if page were already wb it likely kill perf. I don't think we ever do WC -> UC or UC -> WC.
Created attachment 25107 [details] ttm_set_caching_nopat.patch I guess the attached patch fixes it. However it might make sense to fix the PAT code instead because this interface is somewhat annoying.
BTW, sorry for not taking care of this before.
Sorry for the delay... CONFIG_X86_PAT=y Building w/ patch from comment 9 now.
Yup, that looks good...thanks!
Created attachment 25118 [details] ttm_set_caching_nopat_2.patch This patch uses a simpler approach, would you mind checking that it has the same effect?
The patch from comment 13 is acceptable, but my "seat of the pants" feeling is that the first patch performed better. FWIW the load average seemed 20-30% lower with the first patch too, but I didn't do exhaustive testing by any means.
I'm experiencing similar problems on my RV740 which is hooked up through the PCIe. I first noticed the issue when updating drm-radeon-testing and mesa+xf86-video-ati+libdrm to git master. I'm now using the latest drm-linus branch, which should fix this problem - but apparantly not for me. Maybe it's just similar... Anyway, I now have a average performance of 4fps in CS 1.5. The funny thing is that quake3 is running with around 300fps when not limiting the fps. However that's the only application I have currently installed that does that. Every other OpenGL game I tested slows down to a crawl. I also tried both patches here on drm-radeon-testing prior to checking out drm-linus. Applying them there makes no difference to the performance.
(In reply to comment #15) > I'm experiencing similar problems on my RV740 which is hooked up through the > PCIe. > > I first noticed the issue when updating drm-radeon-testing and > mesa+xf86-video-ati+libdrm to git master. > > I'm now using the latest drm-linus branch, which should fix this problem - > but > apparantly not for me. Maybe it's just similar... > > Anyway, I now have a average performance of 4fps in CS 1.5. The funny thing > is > that quake3 is running with around 300fps when not limiting the fps. However > that's the only application I have currently installed that does that. Every > other OpenGL game I tested slows down to a crawl. > > I also tried both patches here on drm-radeon-testing prior to checking out > drm-linus. Applying them there makes no difference to the performance. Have you tried reverting db78e27de7e2?
I get an error with git when trying to revert the commit, so I just tried the two patches.
(In reply to comment #17) > I get an error with git when trying to revert the commit, so I just tried the > two patches. You'll need to revert f0e2f38befa7 first.
I just went back to commit db78e27de7e2 and reverted then - I'm now building that tree. Reporting back once I've done some tests.
No, nothing - doesn't have any effect on performance. Must be something else. I'm still wondering though since I already went back several weeks in git with xf86-video-ati, libdrm and mesa and the problem still occurs... have to think about what I updated in the past. :(
Patch : http://bugzilla.kernel.org/attachment.cgi?id=25118 Handled-By : Francisco Jerez <currojerez@riseup.net>
*** Bug 15306 has been marked as a duplicate of this bug. ***
I totally forgot about this one: To summarize what _my_ problem was... I found out that the slowdowns were only affecting 32bit application (which includes everything pushed through wine) and also ut2003, which I used for testing. ioquake3 was compiled for 64bit (and it wasn't affected). I probably never had any direct HW-accelerated rendering for 32bit applications in the first place, but I didn't notice this because the indirect rendering (which was selected by default, don't ask me why) wasn't that slow. Now it was either an update to mesa or the mesa gentoo ebuild that "fixed" a problem, so that NOW the sw rasterizer was now loaded by default (I looked like that loading of the 32bit swrast failed in the past, resulting in indirect GLX). Of course this resulted in awfully slow rendering. After creating custom libdrm and mesa libs in /usr/local/lib32/ and also doing a check with 32bit glxinfo the whole rendering is now HW-accelerated, slowdowns are gone for good :)