Bug 26982 - drm-next/linux-next glxgears CPU usage doubles for 32bit, doesn't on 64bit
Summary: drm-next/linux-next glxgears CPU usage doubles for 32bit, doesn't on 64bit
Status: RESOLVED INSUFFICIENT_DATA
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: i386 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-17 23:17 UTC by Chris Kennedy
Modified: 2011-02-01 05:52 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.37-rc7-next-20101228+
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Chris Kennedy 2011-01-17 23:17:15 UTC
Somewhere in the drm-next tree and linux-next, between December 13th ( 2.6.37-rc5-next-20101213+)and 27th (2.6.37-rc7-next-20101228+), a change made the CPU usage on a 32bit system when running any application utilizing the DRM/Vblank/page flipping (testing on ATI Radeon 4350 card at least) including glxgears.  This doesn't happen on 64 bit, it still uses the same amount of CPU as before but the 32bit version of the kernel jumped almost double usage.  This can be seen in Mame using Mesa/DRI and page flipping (classic), but glxgears shows it although it's not as magnified there since it's a very low cpu usage app anyways.  To show the scale, basically an app that used to take 80% cpu usage on one core now requires 130-150% cpu usage and needs another core now.  

I suspect, since looking at changes between those dates, the only real change which could do this  seems like the vblank timestamp stuff added in during that time.  I have tried just the drm-next code or DRM code itself lifted from these kernels and put into 2.6.36.2 and it still does it, so doesn't have anything to do with the other parts of the kernel.  It seems to be centered in the DRM/Radeon code somewhere.  

So to recreate it should be easy, taking the newest linux-next version and running and benchmarking glxgears or any vblank using application (using the newest Mesa and DRI, X Windows from Git and xf86-ati driver from pageflipping branch).  Doing this on a 32bit build and comparing to a 64bit build, it should be pretty noticeable, it is here at least.  I am using Gentoo with the newest X Windows GIT overlay, and linux-next of course.  This is on a core2duo system from HP, an xw4400 workstation, radeon 4350 card using KMS and page flipping/vblank.
Comment 1 Mario Kleiner 2011-01-21 19:19:45 UTC
Not sure what the reason is, but it's probably not the vblank timestamp or pageflipping stuff (Disclaimer: I'm the author of the vblank patches). I just used ftrace on two 32-bit kernel versions:

* The 2.6.37-rc5 kernel (from drm-next) i used to develop and test the patches - includes the vblank timestamping and radeon kms pageflip patches applied, but nothing much further.

* The current drm-next kernel snapshot from 10.1.2011, as packaged by Ubuntu for my Ubuntu 10.10 test system.

I traced with and without pageflipped fullscreen swap, with windowed swap (no pageflipping) and with graphics idle, but vblank irq's still on.

Test system was a Core2Duo 2.2 Ghz (Apple MacBookPro 2,1) with ATI Mobility X1600 (RV530). The kms driver takes basically the same path of execution for my gpu and yours when handling vblank's or pageflips.

Basically all the work of those patches and of the pageflip patches (which doesn't involve execution of loops or other things with possibly unbounded execution time), is done in the vblank irq handler, so excessive time spent in those handlers shouldn't show up in cpu usage of your graphics app, but rather affect whatever is running on the system. The only interaction with userspace when scheduling and completing swaps is with the x-server, not direclty with the client app, so that bits should only show up in X's cpu usage.

ftrace'ing the execution time of the functions in the irq handlers shows less than 0.1 msecs of time spent per page-flipped swap or vblank irq under graphics load, less than 0.05 msecs for interaction with userspace (X-server) during preparation of a pageflip. If you assume a 60 Hz refresh/swap this will end up using less than 1% cpu on one core for swaps, less than 0.5% when idle with vblanks turned on.

You could play with various things to see if it changes something:

1. Running the app non-fullscreen (and without desktop compositor), so it doesn't use pageflipping. Or disable pageflipping in xorg.conf via driver option: Option "EnablePageFlip" "no", to rule out pageflip related problems.

2. echo 0 > /sys/modules/drm/parameters/timestamp_precision_usec
This will disable almost all cleverness in the vblank timestamping, see if it makes a change.

Or using ftrace:

cd /sys/kernel/debug/tracing/

echo radeon_crtc_page_flip > set_graph_function
echo function_graph > current_tracer
cat trace_pipe > /tmp/trace.out &
sleep 5 ; echo 1 > tracing_enabled ; sleep 20 ; echo 0 > tracing_enabled
more /tmp/trace.out

-> Gives you execution time of the pageflip ioctl() -- what should account against x-server's cpu usage.

echo radeon_crtc_handle_flip > set_graph_function
-> Gives you time spent in irq handler for performing the page flip.

echo drm_handle_vblank > set_graph_function
-> Gives you time spent in vblank irq handler for vblank timestamping.

This should all report fairly low numbers (not much more than maybe 100-200 usecs, not the multi-millisecond range needed to create the cpu load you report).

I don't observe increased cpu load for my test apps, but my x-server, libdrm, mesa is not at the latest git versions, but somewhere at the state of november/december last year iirc, so there could be some source of bad interaction with latest kernel?

thanks,
-mario
Comment 2 Anonymous Emailer 2011-01-21 20:13:57 UTC
Reply-To: garry.hurley.jr@gmail.com

Silly question, and probably not relevant with modern chipsets, but could
this be a problem with hardware acceleration versus software acceleration?
Or maybe caching video RAM into System RAM?  I know the former used to be a
big factor in the earlier GPUs (3dfx Voodoo 2 and so forth) and for a while,
the best solution was to disable hardware acceleration in Linux.  Maybe
there is a problem on Chris's card that the hardware acceleration is not
working properly?  The latter would only affect CPU cycles during the cache
update cycles, but if the cache was being refreshed at a fairly fast rate,
it could cause some serious performance lags.

Just a couple of thoughts from the far left of the peanut gallery.

On Fri, Jan 21, 2011 at 2:19 PM, <bugzilla-daemon@bugzilla.kernel.org>wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=26982
>
>
> Mario Kleiner <mario.kleiner@tuebingen.mpg.de> changed:
>
>           What    |Removed                     |Added
>
> ----------------------------------------------------------------------------
>                 CC|                            |mario.kleiner@tuebingen.mpg
>                   |                            |.de
>
>
>
>
> --- Comment #1 from Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
>  2011-01-21 19:19:45 ---
> Not sure what the reason is, but it's probably not the vblank timestamp or
> pageflipping stuff (Disclaimer: I'm the author of the vblank patches). I
> just
> used ftrace on two 32-bit kernel versions:
>
> * The 2.6.37-rc5 kernel (from drm-next) i used to develop and test the
> patches
> - includes the vblank timestamping and radeon kms pageflip patches applied,
> but
> nothing much further.
>
> * The current drm-next kernel snapshot from 10.1.2011, as packaged by
> Ubuntu
> for my Ubuntu 10.10 test system.
>
> I traced with and without pageflipped fullscreen swap, with windowed swap
> (no
> pageflipping) and with graphics idle, but vblank irq's still on.
>
> Test system was a Core2Duo 2.2 Ghz (Apple MacBookPro 2,1) with ATI Mobility
> X1600 (RV530). The kms driver takes basically the same path of execution
> for my
> gpu and yours when handling vblank's or pageflips.
>
> Basically all the work of those patches and of the pageflip patches (which
> doesn't involve execution of loops or other things with possibly unbounded
> execution time), is done in the vblank irq handler, so excessive time spent
> in
> those handlers shouldn't show up in cpu usage of your graphics app, but
> rather
> affect whatever is running on the system. The only interaction with
> userspace
> when scheduling and completing swaps is with the x-server, not direclty
> with
> the client app, so that bits should only show up in X's cpu usage.
>
> ftrace'ing the execution time of the functions in the irq handlers shows
> less
> than 0.1 msecs of time spent per page-flipped swap or vblank irq under
> graphics
> load, less than 0.05 msecs for interaction with userspace (X-server) during
> preparation of a pageflip. If you assume a 60 Hz refresh/swap this will end
> up
> using less than 1% cpu on one core for swaps, less than 0.5% when idle with
> vblanks turned on.
>
> You could play with various things to see if it changes something:
>
> 1. Running the app non-fullscreen (and without desktop compositor), so it
> doesn't use pageflipping. Or disable pageflipping in xorg.conf via driver
> option: Option "EnablePageFlip" "no", to rule out pageflip related
> problems.
>
> 2. echo 0 > /sys/modules/drm/parameters/timestamp_precision_usec
> This will disable almost all cleverness in the vblank timestamping, see if
> it
> makes a change.
>
> Or using ftrace:
>
> cd /sys/kernel/debug/tracing/
>
> echo radeon_crtc_page_flip > set_graph_function
> echo function_graph > current_tracer
> cat trace_pipe > /tmp/trace.out &
> sleep 5 ; echo 1 > tracing_enabled ; sleep 20 ; echo 0 > tracing_enabled
> more /tmp/trace.out
>
> -> Gives you execution time of the pageflip ioctl() -- what should account
> against x-server's cpu usage.
>
> echo radeon_crtc_handle_flip > set_graph_function
> -> Gives you time spent in irq handler for performing the page flip.
>
> echo drm_handle_vblank > set_graph_function
> -> Gives you time spent in vblank irq handler for vblank timestamping.
>
> This should all report fairly low numbers (not much more than maybe 100-200
> usecs, not the multi-millisecond range needed to create the cpu load you
> report).
>
> I don't observe increased cpu load for my test apps, but my x-server,
> libdrm,
> mesa is not at the latest git versions, but somewhere at the state of
> november/december last year iirc, so there could be some source of bad
> interaction with latest kernel?
>
> thanks,
> -mario
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are watching the assignee of the bug.
>
>
>
> ------------------------------------------------------------------------------
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better
> price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> http://p.sf.net/sfu/arcsight-sfd2d
> --
> _______________________________________________
> Dri-devel mailing list
> Dri-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dri-devel
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Comment 3 Chris Kennedy 2011-01-23 06:11:42 UTC
I'll need to double check using the option to turn of pageflip and see if the timestamp method change makes a difference.  It's odd, I'm basically using this as a way to run Mame on arcade monitors in Mame at full screen mode utilizing the original game resolutions.  So they aren't big fancy resolutions or heavy duty requirements, just things like 320x240 at 60Hz mostly.  I'll post the results of those tests when I get a chance, have not focused on it in the last few days but will do that in the next week hopefully.  Thanks for checked deeper into the issue, sounds like very possibly somewhere else then besides the page flip and vblank timestamp changes.

Thanks,
Chris
Comment 4 Chris Kennedy 2011-02-01 05:52:21 UTC
I figured this out, well at least I fixed it by downgrading.  Basically I downgraded in Gentoo from the GIT X11 Overlay in layman to the stable/standard Gentoo X11 programs/libraries.  Did a full update and now everything is running the same CPU usage in both 32bit and 64bit.  I'm not sure what it was, I just know that now it's fixed and using the same kernel so it's not the kernel at all.  I suspect somewhere in the git possibly either there was a certain revision of Xorg items that caused it or it's still there when using Xorg fresh from git.  I'm happy now though, since it also fixed a few other issues I saw and Gentoo moved to Xorg server 1.9.2 now and that does what I need, so no longer am using the GIT versions from the X11 overlay.

Thanks,
Chris

Note You need to log in before you can comment on or make changes to this bug.