I've already tried several radeon boot flags, such as DPM on and off, BAPM on and off, without success. The problem does not occur with the fglrx or Windows driver. I'm using glamor acceleration. I'm using an external 4k display in FullHD resolution, the internal laptop display is disabled.
I've read similar bugs here:
But I'm not sure which of the tips are still relevant in the current tree?
The results are:
Playing a simple car racing game, I can do:
dpm=0 -> 2 laps, then the system shuts down due to thermal zone overheating (regular shutdown, not suddenly).
dpm=-1 -> ~1 lap, the system suddenly turns off hard without warning.
bapm=1 -> ~1 lap, hard turn off
Created attachment 203661 [details]
Created attachment 203671 [details]
dmesg 4.2 bapm=1 hang on radeon load.
This is a time where the system hung up already during boot with bapm active. That doesn't always happen, though.
Also note I can use the system pretty much without problems if I don't use 3D.
3D is used for everything (even "2D" apps). Is the problem specific to this car racing game? It might be a bug in the mesa driver which causes the GPU to lock up. Also make sure the fan and heatsink are free of dust.
No the problem occurs with any 3d load, just not with 2d load usually. Maybe even glxgears is enough.
Yes I've cleaned the fan and heatsink area. Didn't help. :/
Actually, I forgot to mention I used to have problems without active 3D use as well, but I'm now switching the DPM profile to battery during boot. And that solved that part mostly. Also, high CPU use by itself is not a problem. Where not using DPM, I didn't switch to the battery/low power profile.
Created attachment 203781 [details]
Temperature Graph 1 (crash not heat related in this case)
I'm having some doubts about the temperature hypothesis now. With DPM active, there seem to be higher temperatures during boot up than during the freezes or reboots. (visible by the gaps, 5 seconds intervals between measurements).
So with DPM there seems to be another issue than without it. I've tried disabling hyperz, to no avail.
Created attachment 203791 [details]
Temperature Graph BAPM=1
It looks like with BAPM it might more likely be overheating. But I can't reproduce the lockups/sudden reboots at all at the moment.
Booting with radeon.hard_reset=1 I get this error at the point where I usually get a hang:
GPU lockup (current fence id 0x0000000000004aa1 last fence id 0x0000000000004aa8 on ring 0)
Created attachment 203801 [details]
Temperature Graph Hard off radeon.fastfb=1 radeon.pcie_gen2=1 radeon.audio=0 radeon.hard_reset=1
Ok, hard turning off reproduced. It reaches almost 110°.
If you'd like any information please let me know now. Because it seems there is not much interest in finding the problem. So I'll have to and will switch back to the proprietary driver.
(It turned out that I used a different version of the game which crashed the card instead. I get the hard off also without any parameters btw.)
[Wonderful, fglrx doesn't work at all with kernel 4.2... ...]
The fastest way to get the system to overheat is to
- enable redshift (or probably xgamma)
- disable vsync
- set dpm to preformance*
echo performance | sudo tee /sys/class/drm/card0/device/power_dpm_state
- activate cpu turbo mode*
echo 1 | sudo tee /sys/devices/system/cpu/cpufreq/boost
- activate BAPM
- activate DRI3
- stay in the game menu (tested with blazrush, kotor)
* Here the effect is not that certain/serious.
Things that help to avoid overheating:
- boot with nomodeset radeon.modeset=1
#vblank_mode=0 glmark2 --run-forever
does not cause a hang, some of the tests seem less demanding, so the temperature does up and down.
GALLIUM_HUD=temperature is helpful to watch how fast the temperature clims.
Created attachment 206301 [details]
Temperature Graph with nomodeset radeon.modeset=1 and redshift
Created attachment 206311 [details]
Temperature Graph with nomodeset radeon.modeset=1 without redshift
So the problems are worse as summer approaches. Still present in 4.6. The system also shuts down hard with vdpau video playback. The weird thing is that the hard shutdowns occur at a lower temperature if dpm is active than if it isn't.
Any hints for debugging this?
So, OpenGL and VDPAU crash the GPU. Any chance you could also test OpenCL? Not sure if  works on R600 OpenCL, but  does.
I think I've solved it.
The kernel parameter
seems to work around the issue. The performance is degraded, but the temperatures mostly stay below 80°C. Generally the system appears to stay much cooler.
(In reply to Dionisus Torimens from comment #17)
> I think I've solved it.
> The kernel parameter
> seems to work around the issue. The performance is degraded, but the
> temperatures mostly stay below 80°C. Generally the system appears to stay
> much cooler.
Is this a multi-GPU notebook? That option only affects Hybrid laptops with multiple GPUs.
Ok, true, the issue is still there. No, not multi-GPU.