Bug 68571 - GPU lockup on AMD Radeon HD6850 with DPM=1
Summary: GPU lockup on AMD Radeon HD6850 with DPM=1
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-11 15:04 UTC by kilobug
Modified: 2016-09-20 06:15 UTC (History)
15 users (show)

See Also:
Kernel Version: 3.13-rc6 / 3.12
Subsystem:
Regression: No
Bisected commit-id:


Attachments
DMESG containing two lockups (and boot messages) (66.31 KB, text/plain)
2014-01-11 15:05 UTC, kilobug
Details
DMESG with the drm-next-3.14 branch (68.76 KB, text/plain)
2014-01-12 19:39 UTC, kilobug
Details
The VBIOS (as asked on https://bugs.freedesktop.org/show_bug.cgi?id=73053) (64.00 KB, application/octet-stream)
2014-01-12 19:42 UTC, kilobug
Details
Xorg.log (with dpm enabled) (44.06 KB, text/plain)
2014-01-12 19:44 UTC, kilobug
Details
Xorg.log (with dpm disabled) (54.72 KB, text/plain)
2014-01-12 19:45 UTC, kilobug
Details
DMESG : with the drm-next branch and the patch from https://bugzilla.kernel.org/show_bug.cgi?id=68571#c5 (80.81 KB, text/plain)
2014-01-15 20:31 UTC, kilobug
Details
dmesg with two gpu lockups (84.81 KB, text/plain)
2014-01-31 00:44 UTC, perry3d
Details
dmesg from drm-next 3.14 with patch (78.19 KB, application/octet-stream)
2014-02-01 15:20 UTC, perry3d
Details
The vbios from the MSI R6870 Hawk (62.50 KB, application/octet-stream)
2014-02-01 15:21 UTC, perry3d
Details
possible fix (1.13 KB, patch)
2014-03-06 21:33 UTC, Alex Deucher
Details | Diff
possible fix (1.19 KB, patch)
2014-07-01 16:16 UTC, Alex Deucher
Details | Diff
Log file after i boot Windows 7 (101.90 KB, text/x-log)
2014-07-02 18:28 UTC, perry3d
Details
Directly booted into Arch Linux (110.14 KB, text/x-log)
2014-07-02 18:28 UTC, perry3d
Details
Kernel panic (320.17 KB, image/jpeg)
2014-07-03 07:51 UTC, perry3d
Details
jounralctl -k before kernel panic (185.54 KB, text/x-log)
2014-07-03 07:54 UTC, perry3d
Details
dmesg after freeze using terminal and a SAK (78.69 KB, text/plain)
2014-08-02 07:24 UTC, kilobug
Details
dmesg after (uvd related) gpu lockup and successful reset (124.66 KB, text/x-log)
2014-11-05 02:18 UTC, prettyvanilla
Details
vbios of the sapphire radeon hd 6870 dirt3 edition (64.00 KB, application/octet-stream)
2014-11-05 02:19 UTC, prettyvanilla
Details
Kernel log for GPU lockup (85.57 KB, text/plain)
2015-05-08 13:33 UTC, joe.r.floss.user
Details

Description kilobug 2014-01-11 15:04:26 UTC
Hi,

When using dpm with my Radeon HD6850 I get frequent "GPU lockup", with the screen freezing for a few seconds. Sometimes (but not often) the system completely freezes (hard reset required). It happens both during playing OpenGL games and during normal (not GPU intensive) activities, slightly more often during games.

I have the issue with both kernel 3.12 (manually enabling the DPM) and 3.13-rc6 (in which it's enabled by default). With DPM disabled, all is working fine. I'm using Mesa 10.0.1, but had the problem with Mesa 9.2.2 on kernel 3.12 too (didn't try Mesa 9.2.2 with kernel 3.13-rc6).

I'm using Debian GNU/Linux 64-bits, all packages are from the Debian archive (Mesa 10 and Linux 3.13-rc6 from experimental), but I don't think the problem is Debian-specific so I prefer to report it here.

I include a "dmesg" containing two of those lockups.

The card itself is a Saphire HD6850 with 1Gb of GDDR5 memory, the CPU is a Intel(R) Core(TM)2 Duo CPU E8400, memory is 2x2Gb of DDR2, the motherboard a ASUSTeK P5E3. 

Feel free to ask for any additional information, or any test I could perform.

Regards,
Comment 1 kilobug 2014-01-11 15:05:16 UTC
Created attachment 121641 [details]
DMESG containing two lockups (and boot messages)
Comment 2 Alex Deucher 2014-01-11 15:53:30 UTC
Please see:
https://bugs.freedesktop.org/show_bug.cgi?id=73053
Please try the suggestions on that bug.
Comment 3 Alex Deucher 2014-01-11 15:59:03 UTC
Are things any better with my 3.14 branch?
http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.14
Comment 4 kilobug 2014-01-12 19:38:39 UTC
Sorry, but it seems even worse : I get the "lockup" as often, but it doesn't recover from it anymore. Either the system fully freeze after a lockup (with garbage on the screen), or it keeps freezing and resetting in loop.

By switching to VT1 after a freeze I managed to do a dmesg while it was in a freeze/reset loop.

With dpm=0 it works fine on your branch (like it does on the 3.12 or 3.13-rc6).

PS : I get the version string "3.13.0-rc4+" with your branch, is that right or did I use the wrong version ? I got it with "git clone git://people.freedesktop.org/~agd5f/linux" and then "git checkout drm-next-3.14".
Comment 5 kilobug 2014-01-12 19:39:32 UTC
Created attachment 121681 [details]
DMESG with the drm-next-3.14 branch
Comment 6 kilobug 2014-01-12 19:42:45 UTC
Created attachment 121691 [details]
The VBIOS (as asked on https://bugs.freedesktop.org/show_bug.cgi?id=73053)
Comment 7 kilobug 2014-01-12 19:44:55 UTC
Created attachment 121701 [details]
Xorg.log (with dpm enabled)
Comment 8 kilobug 2014-01-12 19:45:20 UTC
Created attachment 121711 [details]
Xorg.log (with dpm disabled)
Comment 9 Alex Deucher 2014-01-13 17:37:13 UTC
(In reply to kilobug from comment #4)
> Sorry, but it seems even worse : I get the "lockup" as often, but it doesn't
> recover from it anymore. Either the system fully freeze after a lockup (with
> garbage on the screen), or it keeps freezing and resetting in loop.
> 

Have you tried this patch from comment 5 of the other bug?
https://bugs.freedesktop.org/attachment.cgi?id=91609


> By switching to VT1 after a freeze I managed to do a dmesg while it was in a
> freeze/reset loop.
> 
> With dpm=0 it works fine on your branch (like it does on the 3.12 or
> 3.13-rc6).
> 
> PS : I get the version string "3.13.0-rc4+" with your branch, is that right
> or did I use the wrong version ? I got it with "git clone
> git://people.freedesktop.org/~agd5f/linux" and then "git checkout
> drm-next-3.14".

Yes, that is correct, it's based on Dave's drm-next branch.
Comment 10 kilobug 2014-01-15 20:30:40 UTC
I tried the patch, with various kind of programs (3d games, 2d games, desktop apps, mplayer, ...). It worked somewhat better, but I still got some lockup with some programs (I'll send the dmesg), and even some complete freeze (like while playing a video with mplayer).

If needed, I can try to make more tests with various programs (games, phoronix test suite, ...) and give the result (works well, lockup but recovery, freeze) depending on the program. But they'll take some time (since I'll need to reboot quite a lot), so I'll only be able to do it during the week-end, and only if it's useful. Also, should I limit myself to free software, or should I include some non-free games in the mix ?
Comment 11 kilobug 2014-01-15 20:31:46 UTC
Created attachment 122201 [details]
DMESG : with the drm-next branch and the patch from https://bugzilla.kernel.org/show_bug.cgi?id=68571#c5
Comment 12 Alex Deucher 2014-01-15 21:14:05 UTC
Does disabling hyperZ help?  E.g., set env var R600_DEBUG=nohyperz in /etc/environment or however your distro handles global env vars.
Comment 13 kilobug 2014-01-16 19:17:48 UTC
I tried, didn't help. I also tried disabling the "R600_DEBUG=sb" that I had there, didn't help either.

Note : I tried both with the patched drm-next kernel, should I try with another build ?
Comment 14 kilobug 2014-01-30 08:50:05 UTC
Anything else I can do to help location/fixing the issue ?
Comment 15 perry3d 2014-01-31 00:44:41 UTC
Created attachment 123901 [details]
dmesg with two gpu lockups

Hi,

i think i got the same problem with kernel 3.13 on an arch distro and dpm enabled.
GPU is a MSI 6870 hawk.
Here the output of dmesg after two gpu locks.
A small hint: it almost always happens when playing a video (mplayer or browser).

No problems if i append radeon.dpm=0 at the kernel command line.
Comment 16 perry3d 2014-01-31 10:55:31 UTC
I forgot to say: the problem started after i switched my board & CPU a week ago.
Now i use a gigabyte h87-hd3 board with an intel xeon e3-1230v3 CPU. Before it was an AMD CPU with an Foxconn A7DA-S board.

Maybe it has something to do with the switch from PCIExpress 2.0 to 3.0?
Comment 17 kilobug 2014-01-31 11:09:31 UTC
I have a PCIExpress 2.0 board (and the lockups), but I have an Intel CPU. Not sure it could be linked to Intel CPU vs AMD CPU, or something else related to the motherboard. I can include a dmidecode output if it can be of any use.
Comment 18 perry3d 2014-02-01 15:16:40 UTC
You are right, its not PCIExpress 2.0 vs. 3.0. I forced it in the bios, no improvment.

I also tried the drm-next-3.14 branch from Alex repository. Made it even worse: freeze on boot. Before i reached kdm and sometimes i saw the desktop.  
After applying the patch from https://bugzilla.kernel.org/show_bug.cgi?id=68571#c5 i am on the same behaviour like i was with stock kernel 3.13.

I discovered that there is no kernel panic. I can get back to KDM with the SysRQ keys (Alt + Print + K).

Maybe its because my card is slightly overclocked by the factory: http://msi.com/product/vga/R6870-Hawk.html (in compare with the reference card: 930Mhz vs 900Mhz).

Something else i can do?
Comment 19 perry3d 2014-02-01 15:20:09 UTC
Created attachment 124071 [details]
dmesg from drm-next 3.14 with patch

This is the dmesg log file after getting a GPU freeze. The kernel is build from drm-next-3.14 branch and addionally the patch from https://bugzilla.kernel.org/show_bug.cgi?id=68571#c5
Comment 20 perry3d 2014-02-01 15:21:45 UTC
Created attachment 124081 [details]
The vbios from the MSI R6870 Hawk

Downloaded the vbios as described in https://bugs.freedesktop.org/show_bug.cgi?id=73053
Comment 21 Alexandre Demers 2014-02-01 15:24:22 UTC
(In reply to perry3d from comment #18)
> You are right, its not PCIExpress 2.0 vs. 3.0. I forced it in the bios, no
> improvment.
> 
> I also tried the drm-next-3.14 branch from Alex repository. Made it even
> worse: freeze on boot. Before i reached kdm and sometimes i saw the desktop.
> 
> After applying the patch from
> https://bugzilla.kernel.org/show_bug.cgi?id=68571#c5 i am on the same
> behaviour like i was with stock kernel 3.13.
> 
> I discovered that there is no kernel panic. I can get back to KDM with the
> SysRQ keys (Alt + Print + K).
> 
> Maybe its because my card is slightly overclocked by the factory:
> http://msi.com/product/vga/R6870-Hawk.html (in compare with the reference
> card: 930Mhz vs 900Mhz).
> 
> Something else i can do?

I may be wrong here, but I think that for now (which is something I worked with Alex Deucher in a different bug), overclocked cards are being limited to the reference values.
Comment 22 perry3d 2014-02-01 15:31:09 UTC
That's no problem, as long as they are stable :).
But i think you are wrong: in the dmesg output the card is set to 930 Mhz (if i read it correctly). And the stock 6870 uses 900Mhz. Maybe the voltage is not adjusted correctly? Whats the effect of an undervolted card?
Comment 23 Alexandre Demers 2014-02-01 16:38:54 UTC
(In reply to perry3d from comment #22)
> That's no problem, as long as they are stable :).
> But i think you are wrong: in the dmesg output the card is set to 930 Mhz
> (if i read it correctly). And the stock 6870 uses 900Mhz. Maybe the voltage
> is not adjusted correctly? Whats the effect of an undervolted card?

Yes, but if I remember correctly again, the real values are not printed out in kernel <= 3.13 if they were limited. I think I've seen a patch from Alex or Christian about that so we can have the real values in kernel 3.14.

Otherwise, I could provide you a patch of mine that gives the real values used (it's printed as a warning saying the values have been limited to X). I'm experiencing similar problems with my HD6950, that's why I've been digging on my side.
Comment 24 Alex Deucher 2014-02-01 20:18:06 UTC
For reference the patch is here:
http://lists.freedesktop.org/archives/dri-devel/2014-January/052947.html
Comment 25 kilobug 2014-02-02 13:26:55 UTC
There is no overclocking on my card, reference HD6850 is 775Mhz (accoring to http://www.amd.com/us/products/desktop/graphics/amd-radeon-hd-6000/hd-6850/Pages/amd-radeon-hd-6850-overview.aspx#2) and the Sapphire has the same frequency, according to both Sapphire website (http://www.sapphiretech.com/presentation/product/?pid=497&lid=1) and the radeon-profile utility.
Comment 26 perry3d 2014-02-08 20:54:32 UTC
Today i build the drm-fixes-3.14 branch from git://people.freedesktop.org/~agd5f/linux . No problems so far. Would be interesting which commit fixed the gpu locks.
Comment 27 Alex Deucher 2014-02-09 23:43:09 UTC
dpm is disabled by default again on these asics until we fix this issue.
Comment 28 perry3d 2014-02-10 22:00:29 UTC
DPM is enabled by passing radeon.dpm=1 to the kernel paramters:

root@perry64 # cat /sys/kernel/debug/dri/64/radeon_pm_info
uvd    vclk: 0 dclk: 0
power level 2    sclk: 93000 mclk: 105000 vddc: 1185 vddci: 1150
root@perry64 # cat /sys/kernel/debug/dri/64/radeon_pm_info
uvd    vclk: 0 dclk: 0
power level 0    sclk: 10000 mclk: 15000 vddc: 950 vddci: 950

And no locks since two days.
Comment 29 Alex Deucher 2014-02-10 22:23:09 UTC
There was also a dpm restructuring in 3.14 that may have helped, but as reported in comment 4, it didn't seem to help kilobug.  You could try bisecting.
Comment 30 perry3d 2014-02-10 22:30:53 UTC
I also tried the drm-next-3.14 branch mentioned in comment 3 and got the same results as kilobug in comment 4. So it has to be something else. I will try to bisect it when i have some time.
Comment 31 kilobug 2014-02-11 18:21:26 UTC
I tried the drm-fixes-3.14 branch, but it didn't work better for me - playing a video with mplayer freezes the system in less than one minute (with radeon.dpm=1). I didn't run any other test with it.

If needed, I can run more tests during the week-end, but I don't know what kind of tests could be useful.
Comment 32 Alex Deucher 2014-03-06 21:33:19 UTC
Created attachment 128321 [details]
possible fix

Does the attached patch help?
Comment 33 kilobug 2014-03-06 23:04:11 UTC
Sorry I'll be abroad until the end of March and I won't be able to access the computer with the HD6850 (except through ssh, but hard to test video card that way...), I'll test that when I'm back, probably during the 29-30 week-end.
Comment 34 Mihai Coman 2014-03-07 18:11:50 UTC
Is there a compiled kernel available for ubuntu that includes this patch?
Comment 35 kilobug 2014-03-29 10:39:23 UTC
So, I finally could perform some tests. Two notes first :

1. Meanwhile I upgraded to Mesa 10.1.0.

2. The patch seemed to be already included in the drm-fixes-3.14 branch, so I just tested with the latest version of that branch.

Now the tests themselves :

1. With dpm=0 and hyperz enabled, the drm-fixes-3.14 branch is stable.

2. With dpm=1 and hyperz enabled on Mesa 10.1, both the drm-fixes-3.14 and the vanilla 3.13 would freeze at boot (when loading the display manager, lightdm).

3. With dpm=1 and hyperz disabled, they would work for a while, but as initially described on this bug, after a while they freeze or lockup. The most secure way I found to trigger a freeze is to play a HD movie using the "gl" output of mplayer, with that, it freezes in a few minutes.


@Mihai : I'm uploading my build of the drm-fixes branch (which I wrongly named drmnext, but it's really the drm-fixes branch... sorry, but I didn't rebuild it just to fix the name...) on http://kilobug.net/debian it's for Debian not for Ubuntu but it *should* work on Ubuntu too.
Comment 36 Mihai Coman 2014-03-30 16:57:41 UTC
With DPM=1 I still get gpu lockups with 3.14rc8 and mesa 10.2~git1403270730. It locks up with black screen or a pattern after a few minutes of playing OpenGL games or playing videos with VDPAU. I don't recall ever locking up in normal desktop use.
Comment 37 kilobug 2014-04-12 08:37:03 UTC
I changed my motherboard and CPU today (went from Intel to AMD CPU), but it didn't change the issue at all. I still get a freeze very quickly when playing a HD video on mplayer with the "GL" video output with dpm=1, but it works smoothly with dpm=0.

Is there any additional information I can provide to help ? Would a dmidecode of the new motherboard by of any help ? Additional tests ?
Comment 38 creich 2014-05-04 11:41:32 UTC
had the same issue over here. switched over to 3.14.2 which fixed the problem for me.

maybe that's an option for you :)
Comment 39 kilobug 2014-05-08 09:39:56 UTC
I still have the problem with 3.14.2. Are you sure you enabled dpm ? Due to this bug, dpm is disabled by default on 3.14 for this chipset, you need to force it enabled.
Comment 40 creich 2014-06-17 05:41:00 UTC
oh sry. didn't notice that! i used it without dpm. thanks for the hint :)
Comment 41 perry3d 2014-06-19 14:07:10 UTC
Hello kilobug,

Alex marked my bug (https://bugzilla.kernel.org/show_bug.cgi?id=78321) as a duplicate. And my solution is to disable the hdmi audio output. 
You can do this by appending radeon.audio=0 to the kernel parameters or just blacklist the snd_hda_intel and the snd_hda_codec_hdmi modules.
Comment 42 kilobug 2014-06-21 08:18:57 UTC
I tried with audio=0 with 3.14 and 3.15-rc8, and it's slightly better, but there are still freezes.

X doesn't crash at startup anymore even with hyperz enabled, but using an OpenGL game or playing a HD movie with mplayer ends up with a system freeze after a while, with hyperz enabled or disabled.
Comment 43 andre+kernel 2014-06-23 21:22:19 UTC
I'm experiencing the same problem with a XFX Radeon HD 7870 GHz Edition.  I've tried both appending radeon.audio=0 to the kernel parameters and blacklisting snd_hda_codec_hdmi to no avail.  Currently, I'm on 3.15.1, but I remember it happening on 3.14, too.
Comment 44 Alex Deucher 2014-06-23 21:44:51 UTC
(In reply to andre+kernel from comment #43)
> I'm experiencing the same problem with a XFX Radeon HD 7870 GHz Edition. 
> I've tried both appending radeon.audio=0 to the kernel parameters and
> blacklisting snd_hda_codec_hdmi to no avail.  Currently, I'm on 3.15.1, but
> I remember it happening on 3.14, too.

You have different hardware.  This bug is specific to BTC parts.  You have an SI part.  Please file a new bug.
Comment 45 Fabian Pas 2014-06-26 14:26:46 UTC
I am experiencing the same problem with my Radeon HD7770. I've added radeon.audio=0 to no avail. I'm on the 3.15.1 kernel now too, but it happened in kernel 3.14 too! It's driving me crazy.
Comment 46 Alex Deucher 2014-06-26 14:30:05 UTC
(In reply to Fabian Pas from comment #45)
> I am experiencing the same problem with my Radeon HD7770. I've added
> radeon.audio=0 to no avail. I'm on the 3.15.1 kernel now too, but it
> happened in kernel 3.14 too! It's driving me crazy.

Once again, different hardware, please file a new bug.
Comment 47 perry3d 2014-06-26 14:33:32 UTC
After testing radeon.audio=0 a few days, i have to say that the bug still remains. But there is definitely a correlation between the hdmi audio and this problem as the things are getting better with audio=0.
The X server often freezes on startup but after a successful login i can work without problems.
Comment 48 Mihai Coman 2014-06-29 19:11:51 UTC
Seems to be fixed on kernel 3.16rc2.
Comment 49 perry3d 2014-06-30 22:00:30 UTC
Can not confirm. X stills freezes with latest kernel from git.

Another hint: radeon.audio=0 and blacklisting the snd_intel_hda modules are not the same. If i remove the modules the problem disappears, but i still get the freezes with radeon.audio=0.
Comment 50 Dieter Nützel 2014-07-01 00:10:59 UTC
(In reply to perry3d from comment #49)
> Can not confirm. X stills freezes with latest kernel from git.
> 
> Another hint: radeon.audio=0 and blacklisting the snd_intel_hda modules are
> not the same. If i remove the modules the problem disappears, but i still
> get the freezes with radeon.audio=0.

Your card is ni (Northern Island)?
Please try this patch on 3.14/.3.15/3.16-rc2/-rc3.
https://bugzilla.kernel.org/show_bug.cgi?id=79071#c4
Comment 51 Alex Deucher 2014-07-01 16:09:04 UTC
Actually, You'd want to try the cypress patch for BTC parts.
Comment 52 Alex Deucher 2014-07-01 16:16:50 UTC
Created attachment 141741 [details]
possible fix

Does this patch help?
Comment 53 perry3d 2014-07-01 16:56:05 UTC
Hi Alex and Dieter,

i am using this patch since 5 min. So far i have no problems.
Thank you!

What is the difference between vdcc and vdcci?
Comment 54 Alex Deucher 2014-07-01 16:57:38 UTC
(In reply to perry3d from comment #53)
> Hi Alex and Dieter,
> 
> i am using this patch since 5 min. So far i have no problems.
> Thank you!
> 
> What is the difference between vdcc and vdcci?

Two different voltages.  vddc is the core voltage.  vddci is the voltage related to the memory interface.
Comment 55 kilobug 2014-07-02 18:11:43 UTC
I tried the patch on the "drm-fixes-3.16" branch, but I got the same result : freeze after a while when playing a movie on mplayer -vo gl (that's the most reliable method I found to get the freeze), same with hyperz enabled and disabled (I still have radeon.audio=0).
Comment 56 Alex Deucher 2014-07-02 18:21:20 UTC
(In reply to kilobug from comment #55)
> I tried the patch on the "drm-fixes-3.16" branch, but I got the same result
> : freeze after a while when playing a movie on mplayer -vo gl (that's the
> most reliable method I found to get the freeze), same with hyperz enabled
> and disabled (I still have radeon.audio=0).

Are you sure you got the right branch?  I pushed a testing branch containing the patch to drm-fixes-3.16-wip
Comment 57 perry3d 2014-07-02 18:28:18 UTC
Created attachment 141931 [details]
Log file after i boot Windows 7

Hi again,

the bug is still not fixed for me. I still get a freeze when login in to KDE.

But i did some research why it worked flawless yesterday: I use a dual boot system on my PC and if i first start Windows 7 and then do a reboot into Arch Linux everything works fine. But if i power off the system for some hours and directly boot into Arch i get the freezes again. And i can reproduce this behavior (did it three times). 

I tried to get the difference between both boot logs. Is there something else that can help?
Comment 58 perry3d 2014-07-02 18:28:54 UTC
Created attachment 141941 [details]
Directly booted into Arch Linux
Comment 59 kilobug 2014-07-02 20:26:55 UTC
I mean I manually (In reply to Alex Deucher from comment #56)
> (In reply to kilobug from comment #55)
> > I tried the patch on the "drm-fixes-3.16" branch, but I got the same result
> > : freeze after a while when playing a movie on mplayer -vo gl (that's the
> > most reliable method I found to get the freeze), same with hyperz enabled
> > and disabled (I still have radeon.audio=0).
> 
> Are you sure you got the right branch?  I pushed a testing branch containing
> the patch to drm-fixes-3.16-wip

I mean I manually applied the patch on https://bugzilla.kernel.org/attachment.cgi?id=141741&action=diff to the "drm-fixes-3.16" branch, I'll try the drm-fixes-3.16-wip branch tomorrow (time to recompile & test).
Comment 60 Alex Deucher 2014-07-02 20:28:05 UTC
(In reply to kilobug from comment #59)
> I mean I manually applied the patch on
> https://bugzilla.kernel.org/attachment.cgi?id=141741&action=diff to the
> "drm-fixes-3.16" branch, I'll try the drm-fixes-3.16-wip branch tomorrow
> (time to recompile & test).

Ok, that's fine.  no need to try the -wip branch directly if you already tried the patch.
Comment 61 perry3d 2014-07-03 07:51:41 UTC
Created attachment 142001 [details]
Kernel panic

Things are getting worse. Now i got a kernel panic. But i was able to take a picture of the stack trace.
Comment 62 perry3d 2014-07-03 07:54:34 UTC
Created attachment 142011 [details]
jounralctl -k before kernel panic

I also get crashes of the snd_hda module as you can see in the log.
Comment 63 Michel Dänzer 2014-07-03 08:10:33 UTC
(In reply to perry3d from comment #61)
> Things are getting worse. Now i got a kernel panic.

That panic is fixed in the current drm-fixes trees, and hopefully soon in Linus' tree.
Comment 64 kilobug 2014-07-03 20:50:26 UTC
Great news !

I tried the -wip branch, and so far no crash (with and without hyperz). I will keep running it and see how it works. I'll do more tests during the week-end, and hopefully they won't crash.

I don't know if I fumbled in applying the patch manually (would surprise me, but not impossible) or if there is another fix in the branch.
Comment 65 kilobug 2014-07-05 08:36:59 UTC
I did more tests, and it's much better, but still not perfect.

During GPU-intensive tasks (movie playing, Unigine benchmarks, a full Civ5 game) it works all well. But during non-GPU tasks (browsing the web, IRC, editing text) I got a few freeze.

The freeze doesn't seem to be complete (all the GUI freezes, but the rest seems to be still working at least for a while), I'll try to get logs/dmesg through ssh, but my laptop died a few days ago so I can't right now... I'll send more information as soon as I can get it.
Comment 66 perry3d 2014-07-05 11:46:20 UTC
@kilobug: did you try the sysrg keys (http://en.wikipedia.org/wiki/Magic_SysRq_key). With Alt+Pause+k i can kill X and start working again.
Comment 67 perry3d 2014-07-05 11:47:25 UTC
Oh i forgot: you have to enable them with "echo 1 > /proc/sys/kernel/sysrq".
Comment 68 kilobug 2014-08-02 07:23:26 UTC
So I did a bit more tests, I had a few freezes mostly in non-GL tasks (terminal, ...). SAK does work to kill X and reset the card, thanks perry3d for the tip.

Here is a dmesg after a freeze and a SAK this morning.
Comment 69 kilobug 2014-08-02 07:24:01 UTC
Created attachment 144881 [details]
dmesg after freeze using terminal and a SAK
Comment 70 prettyvanilla 2014-11-05 02:17:37 UTC
I just had my Sapphire HD 6870 (Dirt3 Edition) successfully reset itself after a lockup for the first time (as opposed to the usual black or funnily patterned screen), so I can post my dmesg. Gaming seems to work without a hitch (played through The Swapper in one go), but doing anything which uses the UVD seems to be a ticking time bomb.
(Yes, Magic SysRq keys work usually after a lockup.)

If there is any additional data that would help to debug this, please tell me how to provide it...
Comment 71 prettyvanilla 2014-11-05 02:18:36 UTC
Created attachment 156571 [details]
dmesg after (uvd related) gpu lockup and successful reset
Comment 72 prettyvanilla 2014-11-05 02:19:14 UTC
Created attachment 156581 [details]
vbios of the sapphire radeon hd 6870 dirt3 edition
Comment 73 joe.r.floss.user 2015-05-08 13:25:13 UTC
Hello,

This bug is still present on Debian unstable, I had it again yesterday; it regularly happens when running FlightGear. It is also present in the just-released jessie. It makes it a bit hazardous to run OpenGL software...

I am attaching the kernel log from yesterday's occurrence of the bug. My graphics adapter is:

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV730 XT [Radeon HD 4670] (prog-if 00 [VGA controller])
        Subsystem: PC Partner Limited / Sapphire Technology Device e100
        Flags: bus master, fast devsel, latency 0, IRQ 50
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at fb9e0000 (64-bit, non-prefetchable) [size=64K]
        I/O ports at be00 [size=256]
        [virtual] Expansion ROM at fb900000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Kernel driver in use: radeon

Symptoms are: screen goes blank for a few seconds, reappears, goes blank again, reappears garbled, alternates between blank and garbled a few times, and if I don't reboot quickly enough, I seem to recall (not sure about this one) that the whole system may crash in a way that requires a hard reset.

If there is something I can test to help solve this bug, please say so. I wonder if there are graphics adapters with decent OpenGL support using free drivers that don't suffer from this kind of bug...

Thanks
Comment 74 Alex Deucher 2015-05-08 13:27:47 UTC
(In reply to joe.r.floss.user from comment #73)
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> RV730 XT [Radeon HD 4670] (prog-if 00 [VGA controller])


Please file your own bug.  This bug is about different hardware.
Comment 75 joe.r.floss.user 2015-05-08 13:33:26 UTC
Created attachment 176191 [details]
Kernel log for GPU lockup

Applies to a [AMD/ATI] RV730 XT [Radeon HD 4670] card
Comment 76 joe.r.floss.user 2015-05-08 13:35:29 UTC
Ah, sorry, just realized it was a different series, will file a new bug. Thanks.
Comment 77 kilobug 2016-03-26 12:10:12 UTC
For information I just changed my GPU to a brand new R9 380X (which uses the amdgpu driver), so I won't be able to easily do more tests on this bug - I still have the 6850 in a box, so if you're pretty sure it's fixed I can swap the card and swap it back, but it's a "delicate" operation so I won't be able to do it frequently.
Comment 78 Weber K. 2016-09-20 01:27:10 UTC
Hi!

I have HD 6850 and Kernel 4.4.14.

This problem appeared for me when I changed rootflags.
Solved with rootflags=relatime,lazytime,commit=60 in kernel parameters.

HTH

Best regards
Weber Kai
Comment 79 Weber K. 2016-09-20 01:45:45 UTC
Forgot to mention: And relatime,lazytime,commit=60 in fstab
I believe maybe dpm need some fs information to work well.
Comment 80 Alex Deucher 2016-09-20 03:05:45 UTC
dpm has nothing to do with the filesystem.
Comment 81 Weber K. 2016-09-20 04:54:45 UTC
I think so as well...

Anyway before, I had to put this in kernel parameters, or else, black screen and hang up during boot:
  radeon.dpm=0

And since I put this in grub.cfg and fstab, never had a black screen again:
  rootflags=relatime,lazytime,commit=60 iommu=noagp radeon.dpm=1

I don't understand why, but it worked...
I have pcie XFX HD6850 AMD 1055T FOSS drivers, kernel 4.4.14 .

Maybe it could help you to troubleshoot this error too.
I think it's worth a try, if you have spare time.

If this don't help you... Then I'm sorry...
Comment 82 Weber K. 2016-09-20 05:05:15 UTC
Ooops...
Forgot to mention... I appreciate your good work supporting kernel!
Thank all of you very much!
Comment 83 Michel Dänzer 2016-09-20 06:15:14 UTC
(In reply to Weber K. from comment #81)
> And since I put this in grub.cfg and fstab, never had a black screen again:
>   rootflags=relatime,lazytime,commit=60 iommu=noagp radeon.dpm=1

It could be iommu=noagp, at least that would make more sense than filesystem flags...

Note You need to log in before you can comment on or make changes to this bug.