Created attachment 109311 [details] lspci -vvv Hello. I have 3.11 kernel with radeon.dpm enabled via kernel boot option. Sometimes under unknown circumstances my GPU resets. It happens seldom and randomly during a common desktop workflow: web browsing, document typing, video playback. No games or video benchmarks or any other GPU-eating stuff. During such lockup my screen turns black, but backlight is still on. After a while image reappears, though it is blurry and after 10-15 seconds my screen is back to normal state completely. Also sometimes after such reset I am able to continue working, but sometimes screen is stuck with one image that appeared after screen restoration and I have to reboot machine to fix this. Everything else seems to work fine during lockups, for example music is playing without any problems. I can't say for sure, but it looks like this bug happens more often when some video is played in Firefox. (JIC: I don't have Adobe Flash installed.) My OS is Gentoo amd64 with vanilla kernel 3.11.{0,1}. And this happens with power_dpm_state set to "balanced". I've attached two dmesg outputs after such lockups. I do understand that it is probably too vague description to fix this and I am ready to provide any other additional info.
Created attachment 109321 [details] dmesg after lockup
Created attachment 109331 [details] another dmesg after lockup
Do you only get the lockups with dpm enabled? If so, try disabling certain dpm features and see if any of them help. See if you can narrow down which if any of them help. E.g., diff --git a/drivers/gpu/drm/radeon/rv6xx_dpm.c b/drivers/gpu/drm/radeon/rv6xx_dpm.c index 5811d27..13c5267 100644 --- a/drivers/gpu/drm/radeon/rv6xx_dpm.c +++ b/drivers/gpu/drm/radeon/rv6xx_dpm.c @@ -1981,10 +1981,10 @@ int rv6xx_dpm_init(struct radeon_device *rdev) else pi->fb_div_scale = 0; - pi->voltage_control = - radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0); + pi->voltage_control = false; +// radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0); - pi->gfx_clock_gating = true; + pi->gfx_clock_gating = false; pi->sclk_ss = radeon_atombios_get_asic_ss_info(rdev, &ss, ASIC_INTERNAL_ENGINE_SS, 0); @@ -1993,13 +1993,14 @@ int rv6xx_dpm_init(struct radeon_device *rdev) /* Disable sclk ss, causes hangs on a lot of systems */ pi->sclk_ss = false; + pi->mclk_ss = false; if (pi->sclk_ss || pi->mclk_ss) pi->dynamic_ss = true; else pi->dynamic_ss = false; - pi->dynamic_pcie_gen2 = true; + pi->dynamic_pcie_gen2 = false; if (pi->gfx_clock_gating && (rdev->pm.int_thermal_type != THERMAL_TYPE_NONE))
(In reply to Alex Deucher from comment #3) > Do you only get the lockups with dpm enabled? If so, try disabling certain > dpm features and see if any of them help. See if you can narrow down which > if any of them help. E.g., So far, yes, it happens only with dpm enabled. I'll try to switch certain features off and report if this helps somehow.
Disabling voltage control doesn't help. i.e. I still have this issue after the following change: - pi->voltage_control = - radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0); + pi->voltage_control = false; +// radeon_atom_is_voltage_gpio(rdev, SET_VOLTAGE_TYPE_ASIC_VDDC, 0); I will try to disable dynamic_pcie_gen2 this time.
Disabling dynamic_pcie_gen2 also doesn't help. Will try mclk_ss next.
mclk_ss doesn't help either. Last option is gfx_clock_gating. Will try it next.
I've just noticed that after such lockup if I only restart X server without full reboot of the machine then video playback (mplayer) is unavailable. xvinfo shows this: X-Video Extension version 2.2 screen #0 no adaptors present Prior to such lockup video playback works flawlessly with the very same configuration. Not sure if this info is useful somehow. These results obtained with the latest stable 3.11.6 kernel.
Changing gfx_clock_gating also doesn't help. There's one more thing though that I've changed alongside dpm: I've enabled aspm=1 for kernel driver. I will remove aspm=1 from modprobe.conf so the driver will choose itself whether it should be on or off. I'll report whether this helps or not.
I've just had another suck lockup on 3.12 vanilla kernel without specifying aspm option to the driver. What should I test now?
Recently Phoronix site published a series of Radeon cards tests. Their setup was kernel 3.12 with dpm enabled, Mesa 10.0 pre-release and Ubuntu 13.10. They had the same problem as me [0]: "The Radeon HD 3650 ... was unstable when running the Source Engine tests and resulted in "GPU lockup CP stall for more than 10000msec" and "*ERROR* radeon: fence wait failed" errors. With frequent GPU lock-ups, the HD 3650 tests were abandoned." [0]: http://www.phoronix.com/scan.php?page=article&item=amd_gallium3d_tf2css&num=2
Ilya, I have a theory I've been testing and 'so far'. Do you have any CPU frequency daemons running thermald or have CPUFreq set to a governor other than performance? If you set to performance and test do you have GPU resets? With GLAMOR and no EXA in xorg.conf, I haven't had a reset in two weeks now with various use, DPM enabled and CPU governor set to performance mode. This might be a big stretch but anything is possible in potential triggers.
(In reply to Shawn Starr from comment #12) Hello, Shawn. I don't have any such daemon, but my cpufreq governor is set to conservative both on AC and battery. Ok, I will try your suggestion, but could you please share what is you theory about?
Somehow, when the CPU shifts frequency this is causing voltage changes that are affecting the GPU's switching DPM power states. I will be trying this week my old tests with EXA to see if this holds true but this time keep the cpufreq governor to performance mode only. I could be grasping at straws but really, nothing would surprise me these days :)
My Theory is incorrect, but what is interesting was I could get the GPU to reset quicker, still, using EXA is unreliable use GLAMOR in your xorg.conf (if you dont have a newer stack). This was stable even if its masking the real issue going on.
I've noticed that this bug happens more frequently on my machine while running Virtualbox. However, it does occur without Virtualbox as well.
Just got another lockup with 3.12.6 kernel. Ping.
This issue still occurs with 3.12.8 kernel.
Still occurs with 3.13.2 kernel. Ping.
(In reply to Ilya Tumaykin from comment #19) Same with 3.13.3.
I am having exactly the same issue on Ubuntu 13.10 (kernel 3.11.0-15-generic). My GPU is HD3450 (R620 LE). Apart form the random hang issue, sometimes the kernel crashes during "modprobe radeon". I have to disable dpm for now.
Reproducible with 3.14.0 kernel.
It occurs at least once a day on 3.18.0-031800rc4-generic.
Still happens on 3.19.2-1-ARCH. After blanking, sometimes the image comes back on; I can move the cursor, but the interface is unresponsive.
Still occurs on kernel 4.7.1-1-ARCH.