Bug 85041 - Random hangs with HD 3000 Intel GPU
Summary: Random hangs with HD 3000 Intel GPU
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: All Linux
: P2 normal
Assignee: intel-gfx-bugs@lists.freedesktop.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-23 10:43 UTC by Thomas Mann
Modified: 2014-11-09 22:03 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.16.3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Xorg log, no errors are shown here (29.13 KB, text/plain)
2014-09-23 10:55 UTC, Thomas Mann
Details

Description Thomas Mann 2014-09-23 10:43:26 UTC
My x220 Lenovo laptop has random gpu hangs.

As the display isn't rendered anymore i could not dump any traces by now.

As this is a Sandy Bridge architecture the patch posted here doesn't work as it's for the Ivy Bridge architecture http://lists.freedesktop.org/archives/intel-gfx/2014-July/048683.html
Comment 1 Thomas Mann 2014-09-23 10:49:19 UTC
As the laptop is only used with an external display attached, it's maybe related to it
Comment 2 Thomas Mann 2014-09-23 10:53:03 UTC
i use this version of X11 drivers

mesa-10.3.0
xf86-video-intel-2.99.916

downgrading mesa or xf86-video-intel doesn't solve the problem
Comment 3 Thomas Mann 2014-09-23 10:55:41 UTC
Created attachment 151591 [details]
Xorg log, no errors are shown here
Comment 4 Jani Nikula 2014-09-23 13:50:56 UTC
Please drop any extra i915.enable_fbc or i915.enable_rc6 etc. module parameters. Can you reproduce the problem with those removed?

Do you see reports about gpu hangs in the dmesg? Please attach the error state.
Comment 5 Thomas Mann 2014-09-23 14:28:29 UTC
i dropped both i915 extra options and i could reproduce the error (therefore i reactivated them) , i did this before i reported the bug here. as the hangs occur randomly, i will try to save the error state the next time the error happens. there are no errors in dmesg at all.
Comment 6 Thomas Mann 2014-09-23 14:30:32 UTC
btw with 3.15.9 i didn't have this error at all but maybe i was just lucky to not trigger it. in my perception the error was introduced with 3.16
Comment 7 Rodrigo Vivi 2014-09-23 23:19:39 UTC
Could you please test with our drm-intel-nightly branch at http://cgit.freedesktop.org/drm-intel

A bisect would be very usefull as well.
Comment 8 Jani Nikula 2014-09-24 08:08:18 UTC
(In reply to rauchwolke from comment #5)
> i dropped both i915 extra options and i could reproduce the error (therefore
> i reactivated them)

These parameters are for debugging and testing only, and we don't support changing them from their platform specific defaults. From 3.18 we'll start tainting the kernel if those parameters have been modified.
Comment 9 Thomas Mann 2014-09-24 16:48:35 UTC
the hang happend again and i saved contents of the file /sys/class/drm/card0/error

"no error state collected"

seems it doesn't trigger an error.

the OS sill worked normal but the screen wasn't repainted at all.
It is also still possible to move the mouse and interact with apps like an audio trought shortcuts but the screen isn't updated.

how do i create a bisect? I use gentoo and as drm-intel-nightly isn't in the portage tree, i don't want to mess around with my system.

my kernel cmdline now includes no i915.* stuff
Comment 10 Thomas Mann 2014-09-24 16:59:49 UTC
"the screen isn't updated" - means except the movement of the mouse which is updated normally. Btw i use kwin with compositing enabled
Comment 11 Rodrigo Vivi 2014-09-26 19:41:15 UTC
Please try i915.enable_psr=0
Comment 12 Thomas Mann 2014-09-28 14:24:13 UTC
it seems the error is related to glamor support for the intel driver.

after i disabled glamor and removed it from my system i couldn't reproduce the GPU hang in the last few days.

The weired thing is that i didn't use glamor at all as i have this in my xorg configuration

   Option      "AccelMethod" "sna"

But the driver was loaded:

[ 35032.248] (II) LoadModule: "glamoregl"
[ 35032.600] (II) Loading /usr/lib64/xorg/modules/libglamoregl.so
[ 35032.681] (II) Module glamoregl: vendor="X.Org Foundation"

Is this hang still kernel related or a xf86-video-intel bug?
Comment 13 Thomas Mann 2014-09-28 14:25:06 UTC
after i removed glamor from my system the Xorg log output now looks like this:

[    23.125] (II) LoadModule: "glamoregl"
[    23.244] (WW) Warning, couldn't open module glamoregl
[    23.244] (II) UnloadModule: "glamoregl"
[    23.244] (II) Unloading glamoregl
[    23.244] (EE) Failed to load module "glamoregl" (module does not exist, 0)
Comment 14 Thomas Mann 2014-09-28 16:51:35 UTC
(In reply to rauchwolke from comment #13)
> after i removed glamor from my system the Xorg log output now looks like
> this:
> 
> [    23.125] (II) LoadModule: "glamoregl"
> [    23.244] (WW) Warning, couldn't open module glamoregl
> [    23.244] (II) UnloadModule: "glamoregl"
> [    23.244] (II) Unloading glamoregl
> [    23.244] (EE) Failed to load module "glamoregl" (module does not exist,
> 0)

after removing

#       Load  "glamoregl"

for the Xorg configuration these lines are gone.
Comment 15 Thomas Mann 2014-09-29 10:03:49 UTC
I could reproduce the bug agai, so glamoregl wasn't responsible.

(In reply to Rodrigo Vivi from comment #11)
> Please try i915.enable_psr=0

I added this option, what exactly is enable_psr doing?
Comment 16 Rodrigo Vivi 2014-09-29 20:50:30 UTC
Does this i915.enable_psr=0 solved your issue?

Panel Self Refresh (PSR) is a feature that increase the power savings my letting a static image on your screen and shut off some display components allowing you save more power. when the screen needs update everything is on again hw exit psr and you get screen updates. 
At some point on 3.16 or 3.17 we tried to enabled it by default but it caused some frozen and blank screens and it was reverted. I'm afraid that you are using a kernel with this feature enabled since the behaviour you describe is missing screen updates when not moving the mouse and not getting any gpu error state.
So please let me know what is the output of your /sys/kernel/debug/dri/0/i915_edp_psr_status

Thanks,
Rodrigo.
Comment 17 Thomas Mann 2014-09-29 23:42:25 UTC
i played around a bit with the kernel and it seems that this patch http://lists.freedesktop.org/archives/intel-gfx/2014-July/048683.html and xf86-video-intel without glamoregl support solved the issue.

when i use the kernel patch with glamor support i experience gpu hangs and when i revert the kernel patch and use xf86-video-intel without glamor i also experience gpu hangs.

i will try the kernel option again as i removed it for the tests but as it seems psr isn't activated at all:

cat /sys/kernel/debug/dri/0/i915_edp_psr_status
Sink_Support: no
Source_OK: no
Enabled: no
Performance_Counter: 0
Comment 18 Rodrigo Vivi 2014-09-29 23:50:20 UTC
Yeah, so nevermind about psr and that kernel option I told you.
Comment 19 Thomas Mann 2014-09-29 23:52:57 UTC
as soon as the gpu hangs the background is stuck but the mouse movement even the changes of the cursor when i glide over an input field or over a link work normally. the gpu hangs randomly and most of the time when i use firefox and move the mouse.
Comment 20 Rodrigo Vivi 2014-09-30 00:20:09 UTC
still no gpu error state?
Comment 21 Thomas Mann 2014-10-02 18:45:51 UTC
no, i'm now running 3.16.3 with http://lists.freedesktop.org/archives/intel-gfx/2014-July/048683.html and the xorg driver without glamoregl, with seems to be stable and creates no gpu hangs
Comment 22 Thomas Mann 2014-10-04 19:06:24 UTC
i could reproduce the error. it seems it happens more seldom when i use xorg without  glamoregl.

i still get:

no error state collected
Comment 23 Thomas Mann 2014-10-15 16:12:04 UTC
i seems 3.16.5 fixes the problems and the hangs are now gone. I will close the bug and reopen it if the error occurs again. thanks for your time and help
Comment 24 Thomas Mann 2014-10-15 16:33:10 UTC
just a few minutes after i posted this, i could reproduce the bug again :(

no error state collected
Comment 25 Rodrigo Vivi 2014-10-15 18:44:28 UTC
Please collect and attach latest i915_error_state.
Thanks.
Comment 26 Thomas Mann 2014-11-09 22:03:00 UTC
It seems the bug was fixed between 3.17 and 3.17.2. I am running this kernel for some time now without a hang. Thanks for your help

Note You need to log in before you can comment on or make changes to this bug.