When using dpm with my Radeon HD6850 I get frequent "GPU lockup", with the screen freezing for a few seconds. Sometimes (but not often) the system completely freezes (hard reset required). It happens both during playing OpenGL games and during normal (not GPU intensive) activities, slightly more often during games.
I have the issue with both kernel 3.12 (manually enabling the DPM) and 3.13-rc6 (in which it's enabled by default). With DPM disabled, all is working fine. I'm using Mesa 10.0.1, but had the problem with Mesa 9.2.2 on kernel 3.12 too (didn't try Mesa 9.2.2 with kernel 3.13-rc6).
I'm using Debian GNU/Linux 64-bits, all packages are from the Debian archive (Mesa 10 and Linux 3.13-rc6 from experimental), but I don't think the problem is Debian-specific so I prefer to report it here.
I include a "dmesg" containing two of those lockups.
The card itself is a Saphire HD6850 with 1Gb of GDDR5 memory, the CPU is a Intel(R) Core(TM)2 Duo CPU E8400, memory is 2x2Gb of DDR2, the motherboard a ASUSTeK P5E3.
Feel free to ask for any additional information, or any test I could perform.
Created attachment 121641 [details]
DMESG containing two lockups (and boot messages)
Please try the suggestions on that bug.
Are things any better with my 3.14 branch?
Sorry, but it seems even worse : I get the "lockup" as often, but it doesn't recover from it anymore. Either the system fully freeze after a lockup (with garbage on the screen), or it keeps freezing and resetting in loop.
By switching to VT1 after a freeze I managed to do a dmesg while it was in a freeze/reset loop.
With dpm=0 it works fine on your branch (like it does on the 3.12 or 3.13-rc6).
PS : I get the version string "3.13.0-rc4+" with your branch, is that right or did I use the wrong version ? I got it with "git clone git://people.freedesktop.org/~agd5f/linux" and then "git checkout drm-next-3.14".
Created attachment 121681 [details]
DMESG with the drm-next-3.14 branch
Created attachment 121691 [details]
The VBIOS (as asked on https://bugs.freedesktop.org/show_bug.cgi?id=73053)
Created attachment 121701 [details]
Xorg.log (with dpm enabled)
Created attachment 121711 [details]
Xorg.log (with dpm disabled)
(In reply to kilobug from comment #4)
> Sorry, but it seems even worse : I get the "lockup" as often, but it doesn't
> recover from it anymore. Either the system fully freeze after a lockup (with
> garbage on the screen), or it keeps freezing and resetting in loop.
Have you tried this patch from comment 5 of the other bug?
> By switching to VT1 after a freeze I managed to do a dmesg while it was in a
> freeze/reset loop.
> With dpm=0 it works fine on your branch (like it does on the 3.12 or
> PS : I get the version string "3.13.0-rc4+" with your branch, is that right
> or did I use the wrong version ? I got it with "git clone
> git://people.freedesktop.org/~agd5f/linux" and then "git checkout
Yes, that is correct, it's based on Dave's drm-next branch.
I tried the patch, with various kind of programs (3d games, 2d games, desktop apps, mplayer, ...). It worked somewhat better, but I still got some lockup with some programs (I'll send the dmesg), and even some complete freeze (like while playing a video with mplayer).
If needed, I can try to make more tests with various programs (games, phoronix test suite, ...) and give the result (works well, lockup but recovery, freeze) depending on the program. But they'll take some time (since I'll need to reboot quite a lot), so I'll only be able to do it during the week-end, and only if it's useful. Also, should I limit myself to free software, or should I include some non-free games in the mix ?
Created attachment 122201 [details]
DMESG : with the drm-next branch and the patch from https://bugzilla.kernel.org/show_bug.cgi?id=68571#c5
Does disabling hyperZ help? E.g., set env var R600_DEBUG=nohyperz in /etc/environment or however your distro handles global env vars.
I tried, didn't help. I also tried disabling the "R600_DEBUG=sb" that I had there, didn't help either.
Note : I tried both with the patched drm-next kernel, should I try with another build ?
Anything else I can do to help location/fixing the issue ?
Created attachment 123901 [details]
dmesg with two gpu lockups
i think i got the same problem with kernel 3.13 on an arch distro and dpm enabled.
GPU is a MSI 6870 hawk.
Here the output of dmesg after two gpu locks.
A small hint: it almost always happens when playing a video (mplayer or browser).
No problems if i append radeon.dpm=0 at the kernel command line.
I forgot to say: the problem started after i switched my board & CPU a week ago.
Now i use a gigabyte h87-hd3 board with an intel xeon e3-1230v3 CPU. Before it was an AMD CPU with an Foxconn A7DA-S board.
Maybe it has something to do with the switch from PCIExpress 2.0 to 3.0?
I have a PCIExpress 2.0 board (and the lockups), but I have an Intel CPU. Not sure it could be linked to Intel CPU vs AMD CPU, or something else related to the motherboard. I can include a dmidecode output if it can be of any use.
You are right, its not PCIExpress 2.0 vs. 3.0. I forced it in the bios, no improvment.
I also tried the drm-next-3.14 branch from Alex repository. Made it even worse: freeze on boot. Before i reached kdm and sometimes i saw the desktop.
After applying the patch from https://bugzilla.kernel.org/show_bug.cgi?id=68571#c5 i am on the same behaviour like i was with stock kernel 3.13.
I discovered that there is no kernel panic. I can get back to KDM with the SysRQ keys (Alt + Print + K).
Maybe its because my card is slightly overclocked by the factory: http://msi.com/product/vga/R6870-Hawk.html (in compare with the reference card: 930Mhz vs 900Mhz).
Something else i can do?
Created attachment 124071 [details]
dmesg from drm-next 3.14 with patch
This is the dmesg log file after getting a GPU freeze. The kernel is build from drm-next-3.14 branch and addionally the patch from https://bugzilla.kernel.org/show_bug.cgi?id=68571#c5
Created attachment 124081 [details]
The vbios from the MSI R6870 Hawk
Downloaded the vbios as described in https://bugs.freedesktop.org/show_bug.cgi?id=73053
(In reply to perry3d from comment #18)
> You are right, its not PCIExpress 2.0 vs. 3.0. I forced it in the bios, no
> I also tried the drm-next-3.14 branch from Alex repository. Made it even
> worse: freeze on boot. Before i reached kdm and sometimes i saw the desktop.
> After applying the patch from
> https://bugzilla.kernel.org/show_bug.cgi?id=68571#c5 i am on the same
> behaviour like i was with stock kernel 3.13.
> I discovered that there is no kernel panic. I can get back to KDM with the
> SysRQ keys (Alt + Print + K).
> Maybe its because my card is slightly overclocked by the factory:
> http://msi.com/product/vga/R6870-Hawk.html (in compare with the reference
> card: 930Mhz vs 900Mhz).
> Something else i can do?
I may be wrong here, but I think that for now (which is something I worked with Alex Deucher in a different bug), overclocked cards are being limited to the reference values.
That's no problem, as long as they are stable :).
But i think you are wrong: in the dmesg output the card is set to 930 Mhz (if i read it correctly). And the stock 6870 uses 900Mhz. Maybe the voltage is not adjusted correctly? Whats the effect of an undervolted card?
(In reply to perry3d from comment #22)
> That's no problem, as long as they are stable :).
> But i think you are wrong: in the dmesg output the card is set to 930 Mhz
> (if i read it correctly). And the stock 6870 uses 900Mhz. Maybe the voltage
> is not adjusted correctly? Whats the effect of an undervolted card?
Yes, but if I remember correctly again, the real values are not printed out in kernel <= 3.13 if they were limited. I think I've seen a patch from Alex or Christian about that so we can have the real values in kernel 3.14.
Otherwise, I could provide you a patch of mine that gives the real values used (it's printed as a warning saying the values have been limited to X). I'm experiencing similar problems with my HD6950, that's why I've been digging on my side.
For reference the patch is here:
There is no overclocking on my card, reference HD6850 is 775Mhz (accoring to http://www.amd.com/us/products/desktop/graphics/amd-radeon-hd-6000/hd-6850/Pages/amd-radeon-hd-6850-overview.aspx#2) and the Sapphire has the same frequency, according to both Sapphire website (http://www.sapphiretech.com/presentation/product/?pid=497&lid=1) and the radeon-profile utility.
Today i build the drm-fixes-3.14 branch from git://people.freedesktop.org/~agd5f/linux . No problems so far. Would be interesting which commit fixed the gpu locks.
dpm is disabled by default again on these asics until we fix this issue.
DPM is enabled by passing radeon.dpm=1 to the kernel paramters:
root@perry64 # cat /sys/kernel/debug/dri/64/radeon_pm_info
uvd vclk: 0 dclk: 0
power level 2 sclk: 93000 mclk: 105000 vddc: 1185 vddci: 1150
root@perry64 # cat /sys/kernel/debug/dri/64/radeon_pm_info
uvd vclk: 0 dclk: 0
power level 0 sclk: 10000 mclk: 15000 vddc: 950 vddci: 950
And no locks since two days.
There was also a dpm restructuring in 3.14 that may have helped, but as reported in comment 4, it didn't seem to help kilobug. You could try bisecting.
I also tried the drm-next-3.14 branch mentioned in comment 3 and got the same results as kilobug in comment 4. So it has to be something else. I will try to bisect it when i have some time.
I tried the drm-fixes-3.14 branch, but it didn't work better for me - playing a video with mplayer freezes the system in less than one minute (with radeon.dpm=1). I didn't run any other test with it.
If needed, I can run more tests during the week-end, but I don't know what kind of tests could be useful.
Created attachment 128321 [details]
Does the attached patch help?
Sorry I'll be abroad until the end of March and I won't be able to access the computer with the HD6850 (except through ssh, but hard to test video card that way...), I'll test that when I'm back, probably during the 29-30 week-end.
Is there a compiled kernel available for ubuntu that includes this patch?
So, I finally could perform some tests. Two notes first :
1. Meanwhile I upgraded to Mesa 10.1.0.
2. The patch seemed to be already included in the drm-fixes-3.14 branch, so I just tested with the latest version of that branch.
Now the tests themselves :
1. With dpm=0 and hyperz enabled, the drm-fixes-3.14 branch is stable.
2. With dpm=1 and hyperz enabled on Mesa 10.1, both the drm-fixes-3.14 and the vanilla 3.13 would freeze at boot (when loading the display manager, lightdm).
3. With dpm=1 and hyperz disabled, they would work for a while, but as initially described on this bug, after a while they freeze or lockup. The most secure way I found to trigger a freeze is to play a HD movie using the "gl" output of mplayer, with that, it freezes in a few minutes.
@Mihai : I'm uploading my build of the drm-fixes branch (which I wrongly named drmnext, but it's really the drm-fixes branch... sorry, but I didn't rebuild it just to fix the name...) on http://kilobug.net/debian it's for Debian not for Ubuntu but it *should* work on Ubuntu too.
With DPM=1 I still get gpu lockups with 3.14rc8 and mesa 10.2~git1403270730. It locks up with black screen or a pattern after a few minutes of playing OpenGL games or playing videos with VDPAU. I don't recall ever locking up in normal desktop use.
I changed my motherboard and CPU today (went from Intel to AMD CPU), but it didn't change the issue at all. I still get a freeze very quickly when playing a HD video on mplayer with the "GL" video output with dpm=1, but it works smoothly with dpm=0.
Is there any additional information I can provide to help ? Would a dmidecode of the new motherboard by of any help ? Additional tests ?
had the same issue over here. switched over to 3.14.2 which fixed the problem for me.
maybe that's an option for you :)
I still have the problem with 3.14.2. Are you sure you enabled dpm ? Due to this bug, dpm is disabled by default on 3.14 for this chipset, you need to force it enabled.
oh sry. didn't notice that! i used it without dpm. thanks for the hint :)
Alex marked my bug (https://bugzilla.kernel.org/show_bug.cgi?id=78321) as a duplicate. And my solution is to disable the hdmi audio output.
You can do this by appending radeon.audio=0 to the kernel parameters or just blacklist the snd_hda_intel and the snd_hda_codec_hdmi modules.
I tried with audio=0 with 3.14 and 3.15-rc8, and it's slightly better, but there are still freezes.
X doesn't crash at startup anymore even with hyperz enabled, but using an OpenGL game or playing a HD movie with mplayer ends up with a system freeze after a while, with hyperz enabled or disabled.
I'm experiencing the same problem with a XFX Radeon HD 7870 GHz Edition. I've tried both appending radeon.audio=0 to the kernel parameters and blacklisting snd_hda_codec_hdmi to no avail. Currently, I'm on 3.15.1, but I remember it happening on 3.14, too.
(In reply to andre+kernel from comment #43)
> I'm experiencing the same problem with a XFX Radeon HD 7870 GHz Edition.
> I've tried both appending radeon.audio=0 to the kernel parameters and
> blacklisting snd_hda_codec_hdmi to no avail. Currently, I'm on 3.15.1, but
> I remember it happening on 3.14, too.
You have different hardware. This bug is specific to BTC parts. You have an SI part. Please file a new bug.
I am experiencing the same problem with my Radeon HD7770. I've added radeon.audio=0 to no avail. I'm on the 3.15.1 kernel now too, but it happened in kernel 3.14 too! It's driving me crazy.
(In reply to Fabian Pas from comment #45)
> I am experiencing the same problem with my Radeon HD7770. I've added
> radeon.audio=0 to no avail. I'm on the 3.15.1 kernel now too, but it
> happened in kernel 3.14 too! It's driving me crazy.
Once again, different hardware, please file a new bug.
After testing radeon.audio=0 a few days, i have to say that the bug still remains. But there is definitely a correlation between the hdmi audio and this problem as the things are getting better with audio=0.
The X server often freezes on startup but after a successful login i can work without problems.
Seems to be fixed on kernel 3.16rc2.
Can not confirm. X stills freezes with latest kernel from git.
Another hint: radeon.audio=0 and blacklisting the snd_intel_hda modules are not the same. If i remove the modules the problem disappears, but i still get the freezes with radeon.audio=0.
(In reply to perry3d from comment #49)
> Can not confirm. X stills freezes with latest kernel from git.
> Another hint: radeon.audio=0 and blacklisting the snd_intel_hda modules are
> not the same. If i remove the modules the problem disappears, but i still
> get the freezes with radeon.audio=0.
Your card is ni (Northern Island)?
Please try this patch on 3.14/.3.15/3.16-rc2/-rc3.
Actually, You'd want to try the cypress patch for BTC parts.
Created attachment 141741 [details]
Does this patch help?
Hi Alex and Dieter,
i am using this patch since 5 min. So far i have no problems.
What is the difference between vdcc and vdcci?
(In reply to perry3d from comment #53)
> Hi Alex and Dieter,
> i am using this patch since 5 min. So far i have no problems.
> Thank you!
> What is the difference between vdcc and vdcci?
Two different voltages. vddc is the core voltage. vddci is the voltage related to the memory interface.
I tried the patch on the "drm-fixes-3.16" branch, but I got the same result : freeze after a while when playing a movie on mplayer -vo gl (that's the most reliable method I found to get the freeze), same with hyperz enabled and disabled (I still have radeon.audio=0).
(In reply to kilobug from comment #55)
> I tried the patch on the "drm-fixes-3.16" branch, but I got the same result
> : freeze after a while when playing a movie on mplayer -vo gl (that's the
> most reliable method I found to get the freeze), same with hyperz enabled
> and disabled (I still have radeon.audio=0).
Are you sure you got the right branch? I pushed a testing branch containing the patch to drm-fixes-3.16-wip
Created attachment 141931 [details]
Log file after i boot Windows 7
the bug is still not fixed for me. I still get a freeze when login in to KDE.
But i did some research why it worked flawless yesterday: I use a dual boot system on my PC and if i first start Windows 7 and then do a reboot into Arch Linux everything works fine. But if i power off the system for some hours and directly boot into Arch i get the freezes again. And i can reproduce this behavior (did it three times).
I tried to get the difference between both boot logs. Is there something else that can help?
Created attachment 141941 [details]
Directly booted into Arch Linux
I mean I manually (In reply to Alex Deucher from comment #56)
> (In reply to kilobug from comment #55)
> > I tried the patch on the "drm-fixes-3.16" branch, but I got the same result
> > : freeze after a while when playing a movie on mplayer -vo gl (that's the
> > most reliable method I found to get the freeze), same with hyperz enabled
> > and disabled (I still have radeon.audio=0).
> Are you sure you got the right branch? I pushed a testing branch containing
> the patch to drm-fixes-3.16-wip
I mean I manually applied the patch on https://bugzilla.kernel.org/attachment.cgi?id=141741&action=diff to the "drm-fixes-3.16" branch, I'll try the drm-fixes-3.16-wip branch tomorrow (time to recompile & test).
(In reply to kilobug from comment #59)
> I mean I manually applied the patch on
> https://bugzilla.kernel.org/attachment.cgi?id=141741&action=diff to the
> "drm-fixes-3.16" branch, I'll try the drm-fixes-3.16-wip branch tomorrow
> (time to recompile & test).
Ok, that's fine. no need to try the -wip branch directly if you already tried the patch.
Created attachment 142001 [details]
Things are getting worse. Now i got a kernel panic. But i was able to take a picture of the stack trace.
Created attachment 142011 [details]
jounralctl -k before kernel panic
I also get crashes of the snd_hda module as you can see in the log.
(In reply to perry3d from comment #61)
> Things are getting worse. Now i got a kernel panic.
That panic is fixed in the current drm-fixes trees, and hopefully soon in Linus' tree.
Great news !
I tried the -wip branch, and so far no crash (with and without hyperz). I will keep running it and see how it works. I'll do more tests during the week-end, and hopefully they won't crash.
I don't know if I fumbled in applying the patch manually (would surprise me, but not impossible) or if there is another fix in the branch.
I did more tests, and it's much better, but still not perfect.
During GPU-intensive tasks (movie playing, Unigine benchmarks, a full Civ5 game) it works all well. But during non-GPU tasks (browsing the web, IRC, editing text) I got a few freeze.
The freeze doesn't seem to be complete (all the GUI freezes, but the rest seems to be still working at least for a while), I'll try to get logs/dmesg through ssh, but my laptop died a few days ago so I can't right now... I'll send more information as soon as I can get it.
@kilobug: did you try the sysrg keys (http://en.wikipedia.org/wiki/Magic_SysRq_key). With Alt+Pause+k i can kill X and start working again.
Oh i forgot: you have to enable them with "echo 1 > /proc/sys/kernel/sysrq".
So I did a bit more tests, I had a few freezes mostly in non-GL tasks (terminal, ...). SAK does work to kill X and reset the card, thanks perry3d for the tip.
Here is a dmesg after a freeze and a SAK this morning.
Created attachment 144881 [details]
dmesg after freeze using terminal and a SAK
I just had my Sapphire HD 6870 (Dirt3 Edition) successfully reset itself after a lockup for the first time (as opposed to the usual black or funnily patterned screen), so I can post my dmesg. Gaming seems to work without a hitch (played through The Swapper in one go), but doing anything which uses the UVD seems to be a ticking time bomb.
(Yes, Magic SysRq keys work usually after a lockup.)
If there is any additional data that would help to debug this, please tell me how to provide it...
Created attachment 156571 [details]
dmesg after (uvd related) gpu lockup and successful reset
Created attachment 156581 [details]
vbios of the sapphire radeon hd 6870 dirt3 edition
This bug is still present on Debian unstable, I had it again yesterday; it regularly happens when running FlightGear. It is also present in the just-released jessie. It makes it a bit hazardous to run OpenGL software...
I am attaching the kernel log from yesterday's occurrence of the bug. My graphics adapter is:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV730 XT [Radeon HD 4670] (prog-if 00 [VGA controller])
Subsystem: PC Partner Limited / Sapphire Technology Device e100
Flags: bus master, fast devsel, latency 0, IRQ 50
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at fb9e0000 (64-bit, non-prefetchable) [size=64K]
I/O ports at be00 [size=256]
[virtual] Expansion ROM at fb900000 [disabled] [size=128K]
Capabilities:  Power Management version 3
Capabilities:  Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities:  Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Kernel driver in use: radeon
Symptoms are: screen goes blank for a few seconds, reappears, goes blank again, reappears garbled, alternates between blank and garbled a few times, and if I don't reboot quickly enough, I seem to recall (not sure about this one) that the whole system may crash in a way that requires a hard reset.
If there is something I can test to help solve this bug, please say so. I wonder if there are graphics adapters with decent OpenGL support using free drivers that don't suffer from this kind of bug...
(In reply to joe.r.floss.user from comment #73)
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> RV730 XT [Radeon HD 4670] (prog-if 00 [VGA controller])
Please file your own bug. This bug is about different hardware.
Created attachment 176191 [details]
Kernel log for GPU lockup
Applies to a [AMD/ATI] RV730 XT [Radeon HD 4670] card
Ah, sorry, just realized it was a different series, will file a new bug. Thanks.
For information I just changed my GPU to a brand new R9 380X (which uses the amdgpu driver), so I won't be able to easily do more tests on this bug - I still have the 6850 in a box, so if you're pretty sure it's fixed I can swap the card and swap it back, but it's a "delicate" operation so I won't be able to do it frequently.
I have HD 6850 and Kernel 4.4.14.
This problem appeared for me when I changed rootflags.
Solved with rootflags=relatime,lazytime,commit=60 in kernel parameters.
Forgot to mention: And relatime,lazytime,commit=60 in fstab
I believe maybe dpm need some fs information to work well.
dpm has nothing to do with the filesystem.
I think so as well...
Anyway before, I had to put this in kernel parameters, or else, black screen and hang up during boot:
And since I put this in grub.cfg and fstab, never had a black screen again:
rootflags=relatime,lazytime,commit=60 iommu=noagp radeon.dpm=1
I don't understand why, but it worked...
I have pcie XFX HD6850 AMD 1055T FOSS drivers, kernel 4.4.14 .
Maybe it could help you to troubleshoot this error too.
I think it's worth a try, if you have spare time.
If this don't help you... Then I'm sorry...
Forgot to mention... I appreciate your good work supporting kernel!
Thank all of you very much!
(In reply to Weber K. from comment #81)
> And since I put this in grub.cfg and fstab, never had a black screen again:
> rootflags=relatime,lazytime,commit=60 iommu=noagp radeon.dpm=1
It could be iommu=noagp, at least that would make more sense than filesystem flags...