Bug 12166

Summary: [mi] EQ overflowing. The server is probably stuck in an infinite loop.
Product: Drivers Reporter: Michael (auslands-kv)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: blocking CC: gordon.jin, jonnylamb, kdan, krcroft, moneta.mace, mozilla_bugs, nemesis
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28-rc6 and 2.6.28-rc7 Subsystem:
Regression: No Bisected commit-id:
Attachments: Xorg.0.log from affected system
Xorg.0.log after resume from suspend to ram
Kernel config of 2.6.28-rc7
xorg.conf
/proc/dri/0/i915_gem_interrupts when server hangs
backtrace before hang
backtrace after hang
strace when hanging server (switch to VT1 and then to VT7)
Xorg.0.log from hanged server

Description Michael 2008-12-05 00:15:09 UTC
Latest working kernel version: 2.6.27.7
Earliest failing kernel version: 2.6.28-rc6 (haven't tested any before)
Distribution: Debian Sid (Sidux) amd64
Hardware Environment: Thinkpad X200s; Intel Graphics Media Accelerator 4500MHD
Software Environment:
xserver-xorg: 1:7.4
xserver-xorg-video-intel: 2.5.1-1
libgl1-mesa-dri: 7.2-1
libdrm2, libdrm-intel1: 2.4.1+git+20081116+930c0e7-1
 
Problem Description:
Very soon after starting X the Xserver hangs with following error log in Xorg.log (from 2.6.28-rc6):

[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/X(xorg_backtrace+0x26) [0x4ebd56]
1: /usr/bin/X(mieqEnqueue+0x291) [0x4cc991]
2: /usr/bin/X(xf86PostMotionEventP+0xc4) [0x496604]
3: /usr/bin/X(xf86PostMotionEvent+0xb1) [0x4967e1]
4: /usr/lib/xorg/modules/input//evdev_drv.so [0x7f412c029082]
5: /usr/bin/X [0x47fa35]
6: /usr/bin/X [0x470817]
7: /lib/libpthread.so.0 [0x7f4143c4ea80]
8: /lib/libc.so.6(ioctl+0x7) [0x7f41422c2277]
9: /usr/lib/libdrm.so.2 [0x7f4140eb8bf1]
10: /usr/lib/libdrm.so.2(drmWaitVBlank+0x20) [0x7f4140eb90a0]
11: /usr/lib/dri/i965_dri.so [0x7f412fd8769e]
12: /usr/lib/dri/i965_dri.so(driWaitForVBlank+0xa3) [0x7f412fd878b3]
13: /usr/lib/dri/i965_dri.so(intelSwapBuffers+0xe5) [0x7f412fd91165]
14: /usr/lib/dri/i965_dri.so [0x7f412fd88c1f]
15: /usr/lib/xorg/modules/extensions//libglx.so [0x7f414171105f]
16: /usr/lib/xorg/modules/extensions//libglx.so [0x7f4141705036]
17: /usr/lib/xorg/modules/extensions//libglx.so [0x7f41417082d2]
18: /usr/bin/X(Dispatch+0x364) [0x44c2d4]
19: /usr/bin/X(main+0x45d) [0x432bcd]
20: /lib/libc.so.6(__libc_start_main+0xe6) [0x7f41422181a6]
21: /usr/bin/X(FontFileCompleteXLFD+0x269) [0x431fa9]
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.

The last lines are repeating over and over. Keyboard no longer functional. Mouse still working, but server not reacting to clicks.

Everything works fine under kernel 2.6.27.7. Kernel 2.6.28-rc7 shows the same problems.

Steps to reproduce:
Start X on described system and after a few seconds to minutes above behaviour occurs.
Comment 1 Michael 2008-12-05 00:16:39 UTC
Created attachment 19154 [details]
Xorg.0.log from affected system
Comment 2 Michael 2008-12-05 00:25:14 UTC
Here are the relevant lines from /var/log/messages:

Dec  5 08:31:59 LaptopMB kernel: [   42.810237] ------------[ cut here ]------------
Dec  5 08:31:59 LaptopMB kernel: [   42.810238] WARNING: at arch/x86/mm/ioremap.c:226 __ioremap_caller+0xcc/0x2ce()
Dec  5 08:31:59 LaptopMB kernel: [   42.810240] Modules linked in: i915 drm hid_dell hid_pl hid_cypress hid_zpff hid_gyration hid_bright hid_sony hid_samsung hid_microsoft hid_tmff hid_monterey hid_ezkey hid_apple hid_a4tech hid_logitech usbhid ff_memless hid_cherry hid_sunplus hid_petalynx hid_belkin rfcomm hid_chicony hidp hid l2cap tun ppdev parport_pc lp parport autofs4 ipv6 acpi_cpufreq cpufreq_conservative cpufreq_ondemand cpufreq_powersave cpufreq_userspace cpufreq_stats freq_table iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables x_tables fuse sha256_generic aes_x86_64 aes_generic cbc dm_crypt snd_pcm_oss snd_mixer_oss snd_hda_intel snd_pcm arc4 snd_seq_dummy ecb snd_seq_oss snd_seq_midi_event snd_seq snd_timer snd_seq_device iwlagn iwlcore uvcvideo psmouse rtc_cmos rtc_core compat_ioctl32 i2c_i801 mac80211 rtc_lib snd videodev pcspkr serio_raw i2c_core v4l1_compat iTCO_wdt soundcore snd_page_alloc cfg80211 btusb bluetooth battery video output ac think
Dec  5 08:31:59 LaptopMB kernel: pad_acpi rfkill wmi button led_class intel_agp nvram evdev ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sd_mod ahci ata_generic libata scsi_mod ehci_hcd ide_pci_generic uhci_hcd e1000e ide_core thermal processor fan thermal_sys
Dec  5 08:31:59 LaptopMB kernel: [   42.810333] Pid: 3907, comm: Xorg Not tainted 2.6.28-rc7 #1
Dec  5 08:31:59 LaptopMB kernel: [   42.810335] Call Trace:
Dec  5 08:31:59 LaptopMB kernel: [   42.810339]  [<ffffffff80245aaa>] warn_on_slowpath+0x5d/0x82
Dec  5 08:31:59 LaptopMB kernel: [   42.810343]  [<ffffffff80231d19>] ? change_page_attr_set_clr+0x13b/0x309
Dec  5 08:31:59 LaptopMB kernel: [   42.810346]  [<ffffffff80232160>] ? _set_memory_uc+0x27/0x29
Dec  5 08:31:59 LaptopMB kernel: [   42.810350]  [<ffffffff80491ea8>] ? _read_lock+0x9/0x16
Dec  5 08:31:59 LaptopMB kernel: [   42.810352]  [<ffffffff80230bfe>] __ioremap_caller+0xcc/0x2ce
Dec  5 08:31:59 LaptopMB kernel: [   42.810355]  [<ffffffff80230f10>] ? ioremap_wc+0x27/0x2b
Dec  5 08:31:59 LaptopMB kernel: [   42.810358]  [<ffffffff80230ee7>] ioremap_nocache+0x17/0x19
Dec  5 08:31:59 LaptopMB kernel: [   42.810360]  [<ffffffff80230f10>] ioremap_wc+0x27/0x2b
Dec  5 08:31:59 LaptopMB kernel: [   42.810368]  [<ffffffffa0580d5f>] i915_gem_entervt_ioctl+0x456/0x4eb [i915]
Dec  5 08:31:59 LaptopMB kernel: [   42.810381]  [<ffffffffa0560ca4>] drm_ioctl+0x1db/0x263 [drm]
Dec  5 08:31:59 LaptopMB kernel: [   42.810388]  [<ffffffffa0580909>] ? i915_gem_entervt_ioctl+0x0/0x4eb [i915]
Dec  5 08:31:59 LaptopMB kernel: [   42.810391]  [<ffffffff802d23ab>] vfs_ioctl+0x64/0x7d
Dec  5 08:31:59 LaptopMB kernel: [   42.810394]  [<ffffffff802d276c>] do_vfs_ioctl+0x3a8/0x3da
Dec  5 08:31:59 LaptopMB kernel: [   42.810397]  [<ffffffff802c6dc9>] ? __fput+0x190/0x19c
Dec  5 08:31:59 LaptopMB kernel: [   42.810400]  [<ffffffff802d27f8>] sys_ioctl+0x5a/0x7c
Dec  5 08:31:59 LaptopMB kernel: [   42.810404]  [<ffffffff8021110a>] system_call_fastpath+0x16/0x1b
Dec  5 08:31:59 LaptopMB kernel: [   42.810406] ---[ end trace d99277641ba4b26c ]---
Dec  5 08:33:09 LaptopMB kernel: [  113.155761] resource map sanity check conflict: 0xd0000000 0xdfffffff 0xd0
000000 0xd1feffff vesafb
Dec  5 08:33:09 LaptopMB kernel: [  113.155775] ------------[ cut here ]------------
Dec  5 08:33:09 LaptopMB kernel: [  113.155780] WARNING: at arch/x86/mm/ioremap.c:226 __ioremap_caller+0xcc/0x
2ce()
Dec  5 08:33:09 LaptopMB kernel: [  113.155785] Modules linked in: i915 drm hid_dell hid_pl hid_cypress hid_zp
ff hid_gyration hid_bright hid_sony hid_samsung hid_microsoft hid_tmff hid_monterey hid_ezkey hid_apple hid_a4
tech hid_logitech usbhid ff_memless hid_cherry hid_sunplus hid_petalynx hid_belkin rfcomm hid_chicony hidp hid
 l2cap tun ppdev parport_pc lp parport autofs4 ipv6 acpi_cpufreq cpufreq_conservative cpufreq_ondemand cpufreq
_powersave cpufreq_userspace cpufreq_stats freq_table iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_def
rag_ipv4 iptable_mangle iptable_filter ip_tables x_tables fuse sha256_generic aes_x86_64 aes_generic cbc dm_cr
ypt snd_pcm_oss snd_mixer_oss snd_hda_intel snd_pcm arc4 snd_seq_dummy ecb snd_seq_oss snd_seq_midi_event snd_
seq snd_timer snd_seq_device iwlagn iwlcore uvcvideo psmouse rtc_cmos rtc_core compat_ioctl32 i2c_i801 mac8021
1 rtc_lib snd videodev pcspkr serio_raw i2c_core v4l1_compat iTCO_wdt soundcore snd_page_alloc cfg80211 btusb
bluetooth battery video output ac think
Dec  5 08:33:09 LaptopMB kernel: pad_acpi rfkill wmi button led_class intel_agp nvram evdev ext3 jbd mbcache d
m_mirror dm_region_hash dm_log dm_snapshot dm_mod sd_mod ahci ata_generic libata scsi_mod ehci_hcd ide_pci_gen
eric uhci_hcd e1000e ide_core thermal processor fan thermal_sys
Dec  5 08:33:09 LaptopMB kernel: [  113.156013] Pid: 3907, comm: Xorg Tainted: G        W  2.6.28-rc7 #1
Dec  5 08:33:09 LaptopMB kernel: [  113.156018] Call Trace:
Dec  5 08:33:09 LaptopMB kernel: [  113.156029]  [<ffffffff80245aaa>] warn_on_slowpath+0x5d/0x82
Dec  5 08:33:09 LaptopMB kernel: [  113.156038]  [<ffffffff80231d19>] ? change_page_attr_set_clr+0x13b/0x309
Dec  5 08:33:09 LaptopMB kernel: [  113.156046]  [<ffffffff80232160>] ? _set_memory_uc+0x27/0x29
Dec  5 08:33:09 LaptopMB kernel: [  113.156055]  [<ffffffff80491ea8>] ? _read_lock+0x9/0x16
Dec  5 08:33:09 LaptopMB kernel: [  113.156063]  [<ffffffff80230bfe>] __ioremap_caller+0xcc/0x2ce
Dec  5 08:33:09 LaptopMB kernel: [  113.156071]  [<ffffffff80230f10>] ? ioremap_wc+0x27/0x2b
Dec  5 08:33:09 LaptopMB kernel: [  113.156078]  [<ffffffff80230ee7>] ioremap_nocache+0x17/0x19
Dec  5 08:33:09 LaptopMB kernel: [  113.156085]  [<ffffffff80230f10>] ioremap_wc+0x27/0x2b
Dec  5 08:33:09 LaptopMB kernel: [  113.156104]  [<ffffffffa0580d5f>] i915_gem_entervt_ioctl+0x456/0x4eb [i915
]
Dec  5 08:33:09 LaptopMB kernel: [  113.156114]  [<ffffffff802c5a95>] ? do_sync_write+0xec/0x132
Dec  5 08:33:09 LaptopMB kernel: [  113.156149]  [<ffffffffa0560ca4>] drm_ioctl+0x1db/0x263 [drm]
Dec  5 08:33:09 LaptopMB kernel: [  113.156166]  [<ffffffffa0580909>] ? i915_gem_entervt_ioctl+0x0/0x4eb [i915]
Dec  5 08:33:09 LaptopMB kernel: [  113.156175]  [<ffffffff802d23ab>] vfs_ioctl+0x64/0x7d
Dec  5 08:33:09 LaptopMB kernel: [  113.156182]  [<ffffffff802d276c>] do_vfs_ioctl+0x3a8/0x3da
Dec  5 08:33:09 LaptopMB kernel: [  113.156189]  [<ffffffff80491fc4>] ? _spin_lock+0xe/0x11
Dec  5 08:33:09 LaptopMB kernel: [  113.156196]  [<ffffffff802d27f8>] sys_ioctl+0x5a/0x7c
Dec  5 08:33:09 LaptopMB kernel: [  113.156206]  [<ffffffff8021110a>] system_call_fastpath+0x16/0x1b
Dec  5 08:33:09 LaptopMB kernel: [  113.156211] ---[ end trace d99277641ba4b26c ]---

Hope, you can fix it.
Comment 3 Eric Anholt 2008-12-05 10:25:50 UTC
The summary of this bug should be "GM45 hung on driWaitForVBlank in the server"

Loading an fb driver along with the intel DRM driver has never been a supported configuration.  We're planning on fixing that in 2.6.29 by obsoleting fbdev.  Try removing vesafb and seeing if things work.  It'll clear up those warnings you've got, and probably be a significant boost to system performance.

If that doesn't fix it, what's in /proc/dri/i915_gem_interrupt when you're hung?
Comment 4 Michael 2008-12-05 10:36:43 UTC
Hi eric

now you got me. Guess, I don't understand much...

O.K. The vesafb is compiled into the kernel, if I am not mistaken. So, I need to recompile the kernel?

Second: I always though the system needs a framebuffer device for the console when starting up until X is started. Is this wrong? And if it needs this, how would I remove the framebuffer and start X afterwards?

Sorry, in these fields I am quite a newbee.

Michael
Comment 5 Johannes Engel 2008-12-05 10:58:33 UTC
Try to boot with kernel option "vga=normal". Recompiling the kernel without vesafb would help, too.
Your system does not need a framebuffer device in general, but that depends on your needs.
Comment 6 Michael 2008-12-05 14:30:41 UTC
O.K. I managed to recompile the kernel without framebuffer. The results are mixed, however.

1.) No more "EQ overflowing" -> That's very good! (Problem solved)

2.) With option vga=791 there is no screen until X starts. With vga=normal only a 640x480 resolution is used (awful on a 1280x800 screen). No splashy support as splashy needs a framebuffer, it seems.

3.) X hangs after a resume from suspend to ram. Screen remains black with white mouse cursor, not reacting to any input. The computer, however, is still accessible via ssh. No related messages in /var/log/syslog. I have saved the Xorg log file.

4.) Kernel oopses when shutting down (seems to be the iwlagn driver).

Anyways, point 3) and 4) is most probably not related to the original bug described. However, 2.) is something, I am not sure if this goes into the right direction.
Will there be no splashy support from 2.6.28 on? Only 640x480 text console?
Comment 7 Michael 2008-12-05 14:34:00 UTC
Created attachment 19159 [details]
Xorg.0.log after resume from suspend to ram
Comment 8 Eric Anholt 2008-12-05 14:48:09 UTC
We really can't make a reliable graphics driver when you're voluntarily calling closed-source BIOS code to randomly stomp all over your chip's state.  At least with intelfb, we could figure out what it was breaking and fix it.

Do you have userland code POSTing the chip after resume (probably driven by hal quirks)?  This smashes graphics state, and is unnecessary.  If that's not the problem, then we'll probably be intersted in the backtrace, and the output of cat /proc/dri/0/i915_gem_interrupt
Comment 9 Michael 2008-12-05 15:06:07 UTC
Hi Eric

Ehhmmm, I'm afraid, I don't know, if I have any closed-source BIOS code randomly stomping over my chip's state. 

Actually, I just have installed Sidux (a SID derived distro) from its installation DVD. I needed to upgrade the intel driver to the experimental repo (this was a tipp from thinkwiki) to get it working with the graphics chip.

In general, all works great with 2.6.27.7, including suspend to ram. The only problem I have is with the iwlagn driver which has a bug when lots of data is transferred. I read that the bug is solved in 2.6,28. That's why I have tried to install 2.6.28, which however lead to the above problem.

So, how can I find out if there is any userland code POSTing the chip after resume? I have done a "lshal|grep quirk" with no output.

I can post the output of cat /proc/dri/0/i915_gem_interrupt, if that helps. I have no idea, however, how to make a backtrace.

Sorry, that I am not of much help.
Comment 10 Michael 2008-12-05 15:25:43 UTC
O.K. the hang after resume has nothing to do with the suspend-resume process. It also happens when you just switch to VT1 via CTRL-ALT-F1 and back with CTRL-ALT-F7. Same effect and same Xorg log file.

Maybe I have done something wrong when I disabled vesafb?

I will post i915_gem_interrupts, kernel config and xorg.conf.
Comment 11 Michael 2008-12-05 15:27:25 UTC
Created attachment 19163 [details]
Kernel config of 2.6.28-rc7
Comment 12 Michael 2008-12-05 15:28:13 UTC
Created attachment 19164 [details]
xorg.conf
Comment 13 Michael 2008-12-05 15:30:03 UTC
oohh, messed up the cat /proc/dri/0/i915_gem_interrupts. Need to redo it, tomorrow. It's already very late here.

Thanks,

Michael
Comment 14 Eric Anholt 2008-12-05 15:33:45 UTC
If you're hanging on VT switch, that's probably our 2D driver's fault and should be an easy fix.  A backtrace of the server may actually help there.
Comment 15 Michael 2008-12-06 00:21:50 UTC
O.k., now after several dozens of reboots, I think I have managed to get some more data. I followed the instructions for xserver debugging, although I do not understand much of it.

One problem is certainly that the server does not crash, but hangs. So, I am not sure, if the backtrace helps (it is very short). I did one before the hang and one after the hang.

Furthermore I attach an strace log, which I recorded during the hang, meaning i started the strace, then I switched to VT1 (which works) and then back to VT7 (which gives the hanged server).

It's not much, but it may give some clues.
Comment 16 Michael 2008-12-06 00:23:17 UTC
Created attachment 19171 [details]
/proc/dri/0/i915_gem_interrupts when server hangs
Comment 17 Michael 2008-12-06 00:25:03 UTC
Created attachment 19172 [details]
backtrace before hang
Comment 18 Michael 2008-12-06 00:25:24 UTC
Created attachment 19173 [details]
backtrace after hang
Comment 19 Michael 2008-12-06 00:26:03 UTC
Created attachment 19174 [details]
strace when hanging server (switch to VT1 and then to VT7)
Comment 20 Michael 2008-12-06 00:26:45 UTC
Created attachment 19175 [details]
Xorg.0.log from hanged server
Comment 21 krcroft 2008-12-15 13:09:35 UTC
confirmed from 2.6.28-rc8 on an Intel G33 chipset. 

[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

2.6.27.9 works fine with the same .config.  Some notable options:

CONFIG_PREEMPT=y
CONFIG_PREEMPT_RCU=y
CONFIG_MTRR=y

no frame buffer drivers.
Comment 22 Martin Olsson 2008-12-15 14:06:04 UTC
Confirmed on 2.6.28-2-generic (the current ubuntu jaunty kernel)
on a machine with a intel G45 (Gigabyte GA-EG45M-DS2H board).
I also don't have fb compiled into kernel.

I see this bug 1-3 times per day. On around 75% of the cases
my xorg_log has the EQ spam like this:
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

I've seen it several without that stuff in xorg_log though (but still the xserver has been stuck in the exact same backtrace), i.e it looks like this:

#0  0x00007f26b8221d87 in ioctl () from /lib/libc.so.6
#1  0x00007f26b6dfe8d3 in ?? () from /usr/lib/libdrm.so.2
#2  0x00007f26b6dfed70 in drmWaitVBlank () from /usr/lib/libdrm.so.2
#3  0x00007f26a5ccb85e in ?? () from /usr/lib/dri/i965_dri.so
#4  0x00007f26a5ccbaf0 in driWaitForVBlank () from /usr/lib/dri/i965_dri.so
#5  0x00007f26a5cd53d5 in intelSwapBuffers () from /usr/lib/dri/i965_dri.so
...

More details from my system is available at:
http://bugs.freedesktop.org/show_bug.cgi?id=18922
Comment 23 Khashayar Naderehvandi 2009-01-11 14:30:09 UTC
This bug should be marked resolved.
https://bugs.freedesktop.org/show_bug.cgi?id=18922.
Comment 24 Michael 2009-02-21 00:38:30 UTC
Yesterday I tested the new 2.6.29-rc5 release and also used newest intel driver in debian (2.6.1).

The problem seams to be solved for me. 2.6.29-rc5 works great, except that there is a bug after suspend/resume cycle (3D performance gets VERY choppy).

Not sure where to look for the bug (kernel, Xorg, drivers or mesa?). But it works perfectly before suspending/resuming. After a suspend something seems to be borked.

Btw.: Doesn't seem to make a difference whether I use a framebuffer or not (vga=normal). 
Comment 25 Daniel Klaffenbach 2009-03-31 13:06:31 UTC
I have the same problem with a Radeon Xpress 200M on 2.6.29 and DRI. I am not using any framebuffer:

(II) RADEON(0): Output: S-video, Detected Monitor Type: 0                                                     
[mi] EQ overflowing. The server is probably stuck in an infinite loop.                                        

Backtrace:
0: /usr/bin/X(xorg_backtrace+0x37) [0x81353d7]
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.             
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
Comment 26 Gordon Jin 2009-09-18 02:38:54 UTC
closing as the original problem has been fixed.

Michael, if you have other issues, please file a new bug according to http://www.intellinuxgraphics.org/how_to_report_bug.html. 

Kernel 2.6.31 plus xf86-video-intel 2.8.x are recommended at this point.