Bug 15186

Summary: Radeon KMS: [RV730] Garbled kwin shadows and pixmaps
Product: Drivers Reporter: Robert Schedel (r.schedel)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: CLOSED CODE_FIX    
Severity: normal CC: crazy-ivanovic, neuro, rjw, shawn.starr
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33-rc4 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14885    
Attachments: Xserver log
Screenshot of konsole window executing "x11perf -create"
Screenshot of taskbar part with garbled window entry
Kernel config
Flush HDP before IB
Force hdp flush on set_domain(CPU) & wait_idle
Proper fix

Description Robert Schedel 2010-01-31 11:56:11 UTC
Created attachment 24816 [details]
Xserver log

Since 2.6.33-rc4 up to rc6 I randomly have garbled kwin shadows, either blue or yellow frames around the window shadows.
In kernels 2.6.32 and <=2.6.33-rc3 this was not observable (=regression).

Also described here by someone else:
http://bbs.archlinux.org/viewtopic.php?pid=690047#p690047

The effect happens randomly when switching between windows. However, it is also 100% reproducable for me by the command
  x11perf -create
After this the window has a white shadow frame (see screenshot). The window entry in the taskbar is sometimes garbled with a white pixmap too (see screenshot).

If running with kwin composite *disabled*, only the white pixmap is observed -- as window shadow effects are disabled.

Kernel log or xserver log do not provide any useful hints.

Bisecting the git kernel shows this as culprit:
> [c07d7237a639d57dc91ea7efdbc1b3f85c7a095d] Merge branch 'drm-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
> c07d7237a639d57dc91ea7efdbc1b3f85c7a095d is the first bad commit

Bisecting branch "drm-linus" further showed this culprit:
> cafe6609d6dc0a6a278f9fdbb59ce4d761a35ddd is the first bad commit
> commit cafe6609d6dc0a6a278f9fdbb59ce4d761a35ddd
> Author: Jerome Glisse <jglisse@redhat.com>
> Date:   Thu Jan 7 12:39:21 2010 +0100
>
> drm/radeon/kms: Schedule host path read cache flush through the ring V2

After reverting this patch on 2.6.33-rc6, the issue was resolved on my hardware configuration. However, I am unsure about the internals of this commit.

Information hopefully is sufficient for an expert to track this further.

Environment:
Linux 2.6.33-rc4..6 with Radeon KMS ("modprobe radeon modeset=1")
Mesa master branch
xf86-video-ati master branch
Xorg server 1.7.4
libdrm 2.4.17 (or master too)
KDE 4.3.95 (composite desktop enabled or disabled, see above)
Comment 1 Robert Schedel 2010-01-31 11:58:23 UTC
Created attachment 24817 [details]
Screenshot of konsole window executing "x11perf -create"

Konsole window reproducably has garbled white frame after executing x11perf -create (probably due to large number of window blits).
Comment 2 Robert Schedel 2010-01-31 11:59:14 UTC
Created attachment 24818 [details]
Screenshot of taskbar part with garbled window entry
Comment 3 Robert Schedel 2010-01-31 12:13:49 UTC
Created attachment 24821 [details]
Kernel config
Comment 4 Jérôme Glisse 2010-02-01 11:50:18 UTC
Created attachment 24851 [details]
Flush HDP before IB

Please try if this patch fix the issue (don't revert the other HDP commit, just apply this patch on top of lastest 2.6.33-rc).
Comment 5 Robert Schedel 2010-02-01 18:49:32 UTC
Applied patch on top of vanilla 2.6.33-rc6:
Unfortunately not fixed, no visible impact.

"x11perf -create" still triggered white window frames. Also changing focus between X11 windows still triggered spurious blue/yellow shadow frames around the windows.
Comment 6 Rafael J. Wysocki 2010-02-01 21:14:52 UTC
First-Bad-Commit : cafe6609d6dc0a6a278f9fdbb59ce4d761a35ddd
Handled-By : Jerome Glisse <jglisse@redhat.com>
Comment 7 Jérôme Glisse 2010-02-02 12:58:46 UTC
Created attachment 24870 [details]
Force hdp flush on set_domain(CPU) & wait_idle

Please apply this patch on top of clean 2.6.33rc* and check that it fix the issue. This patch is like reverting the commit you did point out thus it should definitly solve the issue but it's an hack. Once you checked that it evectively fix the issue change the set_domain #if 1 to #if 0 and retest, then rechange to if 1 and change the if 1 of wait_idle to if 0 and retest, report result. Thanks
Comment 8 Robert Schedel 2010-02-02 18:54:24 UTC
Applied second patch on top of vanilla 2.6.33-rc6:
- Both #if 1's: *Fixed*
- set_domain #if 0, wait_idle #if 1: *Fixed*
- set_domain #if 1, wait_idle #if 0: *Not fixed* (same effect as before)

The wait_idle part seems important.
Comment 9 Shawn Starr 2010-02-02 21:43:52 UTC
I have 2.6.33-rc6 from today's linux-2.6 snapshot from Linus's tree
I tried glisse's latest kms hack patch but I still have text glyph
corruption so it doesn't seem to fix it for me. 

I have a radeon HD 3650 mobile (RV635) FireGL.
Comment 10 Robert Schedel 2010-02-02 22:27:58 UTC
I never observed text glyph corruption.

@Shawn Starr: Are you sure you observed the same effect? Did you e.g. bisect it to the same commit? Or did you observe the same window corruptions as described by me, and at least those are resolved now or still unresolved for you?

I guess we should avoid that we mix up all possible (current/future) corruptions into the same ticket.
Comment 11 Shawn Starr 2010-02-02 23:05:09 UTC
Corruption is still unresolved see: http://www.sh0n.net/spstarr/radeon/glyph-corruption.jpg
Comment 12 Michel Dänzer 2010-02-03 09:21:34 UTC
Shawn, please stop spamming this report with your unrelated issue.
Comment 13 Nils Kneuper 2010-02-03 12:11:11 UTC
I do observe the same issues as the original poster. Though for me the issues does exist whenever I use KMS. It makes not difference if 2.6.32* (the first time I tried KMS with my rv670 based card) or 2.6.33-rc6. It makes no difference if I change acceleration method from OpenGL to XRender.
I have not checked the patch provided in this thread, will do so later on and check if the issues persist. In general I do guess that it is not a bug in mesa since without KMS, this issue does not occur. I will get myself rc6, test with your patch and report back.

System info:
lspci | grep VGA
01:00.0 VGA compatible controller: ATI Technologies Inc RV670PRO [Radeon HD 3850]

Gentoo "unstable" amd64
Linux 2.6.32 up to (at least) 2.6.33-rc5 with Radeon KMS (grub boot param: radeon.modeset=1)
Mesa git version, master branch
xf86-video-ati git version, master branch
xorg-server-1.7.4.901 (happened with other xorg-server 1.7.x versions, too)
libdrm git version, master branch
KDE 4.3.5 (composite desktop enabled or disabled), same with previous 4.3.x versions
Comment 14 Nils Kneuper 2010-02-03 13:18:43 UTC
(In reply to comment #8)
> Applied second patch on top of vanilla 2.6.33-rc6:
> - Both #if 1's: *Fixed*
> - set_domain #if 0, wait_idle #if 1: *Fixed*
> - set_domain #if 1, wait_idle #if 0: *Not fixed* (same effect as before)
> 
> The wait_idle part seems important.

Okay, after my tests with 2.6.33-rc6 I do have exactly the same results! So the "wait_idle #if 1" part of the patch basically does the trick. Just to be 100% sure I checked if the issues do occur with vanilla 2.6.32, too, and they are there just as with an unpatched 2.6.33-rc6 for me. So it might in fact be that the "real" issue (at least for me) is something else but the change strangely does fix things.

As a note on the side: it makes no difference if qt 4.5.x or 4.6.x is used, had the issues with both.
Comment 15 Robert Schedel 2010-02-03 21:14:07 UTC
After the last comment I made another doublecheck on vanilla 2.6.32, and in fact Nils is right: Vanilla 2.6.32 also is broken -- I had it in longer use, but either I was wrong in my observation, it was my newer 100% x11perf test scenario, or it was some other masking SW dependency at that time. The effect also looks more like flickering, less permanent at that version.

After further retests this means for kernel versions:
- 2.6.32 (and some older ones): FAILED.
- 2.6.33-rc3: OK.
- 2.6.33-rc4..rc6: FAILED.

Now I made another inverse bisect session today, looking for the first "good" fix before abovementioned bad commit. Finally found this one:
>commit 23956dfa82eab95931aab5fa9886c1e96c41e4dc
>Author: Dave Airlie <airlied@redhat.com>
>Date:   Mon Nov 23 12:01:09 2009 +1000
>
>    drm/radeon/kms: add HDP flushing for all GPUs.

Sounds reasonable.

I am unsure whether you want to keep the regression flag of this ticket. Actually, the point releases always used to be "broken" -- only intermediate rc's were fixed and re-broken :)
Comment 16 Michał Witkowski 2010-02-04 11:26:43 UTC
I can confirm Robert's problem. I've got a Lenovo U330 with Ati 3450. Running either kernel 2.6.32 or 2.6.33rc4-rc6 cause DRI2 (KMS enabled) to show these yellow/white stripes on Kwin shadows. Reverting to DRI (UMS), fixes the problem.
I also can reproduce the problem with 100% confidence using gtkperf on either 2.6.32 or 2.6.33rc4-rc6.
Comment 17 Jérôme Glisse 2010-02-04 16:15:44 UTC
Created attachment 24909 [details]
Proper fix

Can you check that this patch also fix the issue. It's a proper fix, if it does i will ask Dave to put it in the next radeon drm fixes pull.
Comment 18 Robert Schedel 2010-02-04 18:03:46 UTC
Applied last patch to vanilla 2.6.33-rc6: In fact fixed.
Will report back if I should find any adverse effects on the long run.
Thanks, Jérôme.
Comment 19 Michał Witkowski 2010-02-04 18:38:38 UTC
Great, this patch fixed the problem for me too (applied to clean 2.6.33-rc6) :)

Thanks, Jérôme
Comment 20 Nils Kneuper 2010-02-04 19:00:09 UTC
Yes, the "proper fix" seems to work nicely over here, too. Will report if any issues arise again. Thanks for fixing this, Jérôme.
Comment 21 Robert Schedel 2010-02-07 10:58:54 UTC
Applied in 2.6.33-rc7 mainline, regression resolved.
Comment 22 Michał Witkowski 2010-03-06 10:58:35 UTC
I just switched form 2.6.33-rc7 to 2.6.33 final. I also updated mesa/libdrm/glproto/xf86-video-ati from GIT 20100217 to 20100306. The problem with Kwin yellowish shadows is back. Was this fix reverted?
Comment 23 Nils Kneuper 2010-03-06 11:11:57 UTC
I have been using 2.6.33 for a while now and not seen any issue so far with KDE/kwin 4.x. So for me things seem to still be fixed.
Comment 24 Robert Schedel 2010-03-06 11:48:10 UTC
Agree with Nils: I constantly pull git master branches from mesa and xf86-video-ati, also some minutes ago. With 2.6.33 still fixed.
Suggestion: Revert recent updates one by one, bisect, write new ticket (ref to this).