Bug 15389

Summary: 945GM: gpu hangs after half an hours with video overlay enabled.
Product: Drivers Reporter: tomas m (tmezzadra)
Component: Video(DRI - Intel)Assignee: drivers_video-dri-intel (drivers_video-dri-intel)
Status: RESOLVED INVALID    
Severity: normal CC: chris, gordon.jin, jbarnes
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33 Subsystem:
Regression: No Bisected commit-id:
Attachments: intel gpu dump of the freeze

Description tomas m 2010-02-24 23:30:46 UTC
HARDWARE:
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)



xorg.conf relevant section:

Section "Device"
	Identifier  "Card0"
	Driver      "intel"
	VendorName  "Intel Corporation"
	BoardName   "Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller"
	BusID       "PCI:0:2:0"
	Option	    "XvPreferOverlay" "True"
EndSection



xf86-video-intel 2.10.0


after playing a video for about half an hours, the GPU hangs, everything else works. i can switch to a vt and reboot the system.
Comment 1 tomas m 2010-02-24 23:31:36 UTC
Created attachment 25200 [details]
intel gpu dump of the freeze
Comment 2 tomas m 2010-02-25 11:54:31 UTC
more information:


on a given freeze, the GPU does not knot it froze. i tested this by applying chris wilsons patch here: http://bugs.freedesktop.org/attachment.cgi?id=33416

the i915_error_state says it didnt collect any errors.

issuing 'cat 1 > i915_wedged' kind of resets the cpu

then i tried to restart X and the following backtrace appeared. (X hung) this is probably related to the GPU not initialized correctly, but it might prove useful.

------------[ cut here ]------------
kernel BUG at drivers/gpu/drm/i915/intel_display.c:1917!
invalid opcode: 0000 [#1] PREEMPT SMP 
last sysfs file: /sys/devices/virtual/backlight/acpi_video0/actual_brightness
Modules linked in: fuse ipv6 arc4 ecb gspca_zc3xx gspca_main videodev v4l1_compa
t usbhid hid rt73usb rt2x00usb rt2x00lib mac80211 cfg80211 rfkill joydev evdev mmc_block snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_hda_codec_si3054 snd_hda_codec_realtek 8139too mii snd_hda_intel sdhci_pci sdhci snd_hda_codec snd_hwdep snd_pcm_oss mmc_core snd_pcm snd_mixer_oss led_class snd_timer battery uhci_hcd firewire_ohci firewire_core snd soundcore snd_page_alloc crc_itu_t iTCO_wdt iTCO_vendor_support ac ehci_hcd thermal psmouse usbcore serio_raw i2c_i801 cpufreq_ondemand acpi_cpufreq freq_table processor rtc_cmos rtc_core rtc_lib ext3 jbd mbcache sg sr_mod cdrom sd_mod ata_piix ata_generic pata_acpi libata scsi_mod

Pid: 5990, comm: X Not tainted 2.6.33-lappy-dirty #13 Everex StepNote Series/Everex StepNote Series
EIP: 0060:[<c121a911>] EFLAGS: 00213282 CPU: 0
EIP is at intel_crtc_dpms_overlay+0x31/0x50
EAX: fffffffb EBX: f7100ae0 ECX: 00000000 EDX: 00036cff
ESI: 00071008 EDI: f7368000 EBP: 00070180 ESP: f3c51bc0
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process X (pid: 5990, ti=f3c50000 task=f5261360 task.ti=f3c50000)
Stack:
 00006018 c121d318 f711b000 00061200 00070184 f7027400 00000000 00000001
<0> f711b000 f7027400 00000003 00000001 c12198cb ffffffcf c121fd97 00000060
<0> c1328ac0 c1329e80 f7027700 f711b000 c121827d c11eca67 00000001 c13b5a4b
Call Trace:
 [<c121d318>] ? i9xx_crtc_dpms+0x1c8/0x2e0
 [<c12198cb>] ? intel_crtc_dpms+0x2b/0x100
 [<c121fd97>] ? intel_lvds_set_power+0xc7/0xd0
 [<c121827d>] ? intel_crtc_prepare+0xd/0x10
 [<c11eca67>] ? drm_crtc_helper_set_mode+0x1c7/0x360
 [<c11ed63e>] ? drm_crtc_helper_set_config+0x7ee/0x8a0
 [<c12fb201>] ? __mutex_lock_slowpath+0x191/0x2d0
 [<c11fe420>] ? drm_mode_setcrtc+0x140/0x350
 [<c11f1fa8>] ? drm_ioctl+0x218/0x380
 [<c11fe2e0>] ? drm_mode_setcrtc+0x0/0x350
 [<c11f1d90>] ? drm_ioctl+0x0/0x380
 [<c10e74ab>] ? vfs_ioctl+0x2b/0xa0
 [<c10e7689>] ? do_vfs_ioctl+0x79/0x5c0
 [<c10a5a50>] ? filemap_fault+0x0/0x3d0
 [<c10bdade>] ? handle_mm_fault+0xee/0x780
 [<c10d9a86>] ? rw_verify_area+0x66/0xe0
 [<c10e7c46>] ? sys_ioctl+0x76/0x90
 [<c10037df>] ? sysenter_do_call+0x12/0x28
Code: 98 58 04 00 00 85 db 74 26 8b 03 83 c0 14 e8 47 0a 0e 00 89 d8 e8 90 04 01 00 85 c0 74 13 31 d2 89 d8 e8 d3 02 01 00 85 c0 74 e8 <0f> 0b eb fe 5b c3 8b 03 5b 83 c0 14 e9 0e 06 0e 00 8d b4 26 00 
EIP: [<c121a911>] intel_crtc_dpms_overlay+0x31/0x50 SS:ESP 0068:f3c51bc0
---[ end trace ab5b698c4a1f815a ]---
Comment 3 Chris Wilson 2010-02-25 12:08:40 UTC
Doesn't look like a kernel bug, the gpu is not wedged at all and has finished processing with no more work pending. Sounds like userspace has hung.
Comment 4 tomas m 2010-02-25 12:12:35 UTC
where should i send this to then? X?
Comment 5 Chris Wilson 2010-02-25 12:29:51 UTC
Since the gpu has nothing to do, the question is what has stalled. And to answer that we will need more information, a stacktrace from X and its Xorg.log during a hang would be the first step. The only advantage of moving to bugs.fd.o at this point is that it has a slightly wider audience...
Comment 6 tomas m 2010-02-25 14:56:49 UTC
(In reply to comment #5)
> Since the gpu has nothing to do, the question is what has stalled. And to
> answer that we will need more information, a stacktrace from X and its
> Xorg.log
> during a hang would be the first step. The only advantage of moving to
> bugs.fd.o at this point is that it has a slightly wider audience...

built the server with debug symbols.

followed http://wiki.x.org/wiki/Development/Documentation/ServerDebugging#Thebasics

but when the hang occurs. gdb does not interrupt. and there is no way to halt the execution in order to get a stack trace... is there a way to do this?
Comment 7 Jesse Barnes 2010-02-26 23:10:52 UTC
You can see where X is stuck in the kernel:
cat /proc/<pid of X>/wchan will tell you which kernel function it's in.
Comment 8 tomas m 2010-02-27 12:36:17 UTC
(In reply to comment #7)
> You can see where X is stuck in the kernel:
> cat /proc/<pid of X>/wchan will tell you which kernel function it's in.

wchan is empty (during the freeze). Am i supposed to read it with gdb running and attached to X?

on a side note. this happens using mythtv. im not sure if testing another video player is of relevance.. is it?
Comment 9 Gordon Jin 2010-03-01 03:37:27 UTC
A side question, why do you prefer overlay? Does it work if you don't force overlay?

In case you want to move it to fd.o, please refer to http://www.intellinuxgraphics.org/how_to_report_bug.html.
Comment 10 tomas m 2010-03-01 12:50:35 UTC
yes, it works for me without overlay. but i think thats beside the point. i dont use overlay myself.

i will move it to fd.o after i wake up ;)
Comment 11 Jesse Barnes 2010-07-23 19:55:37 UTC
ok moved to fdo I suppose.