Bug 33662

Summary: System hangs during X startup (kwin compositing?)
Product: Drivers Reporter: Luke-Jr (luke-jr+linuxbugs)
Component: Video(DRI - Intel)Assignee: drivers_video-dri-intel (drivers_video-dri-intel)
Severity: blocking CC: florian, kirr, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38 Tree: Mainline
Regression: Yes
Bug Depends on:    
Bug Blocks: 27352    
Attachments: dmesg-

Description Luke-Jr 2011-04-19 04:26:00 UTC
Trying to boot my system with Linux 2.6.38 results in it hanging during KDE's first step (out of 5). The hard drives do not sync, and my screen freezes with the KDE startup, so I have no useful logs. Everything works fine in 2.6.37.

CPU: Intel i5-2400
GPU: Intel 2nd Gen HD Graphics 2000
Motherboard: Gigabyte GA-H67A-UD3H (*with* SATA problem, will be returned soon for *different model*)
GPU: ATi Radeon 5850 (no drivers loaded at all)

xf86-video-intel: 2.15.0
mesa: 7.10.2[-r1]
xorg-server: 1.9.4
Comment 1 Kirill Smelkov 2011-04-25 20:36:59 UTC
Too observing similiar X freeze on startup here on Debian Lenny based system: X tries to start, monitor backlight blinks one time, and then whole system freeze - display is black, no reaction to keyboard (sysrq not working) and no network (i.e. ping from other host is not working).

It used to work with and earlier kernels without problem.

Attaching dmesg and X logs for good and bad cases taken over ssh from nearby machine.

Thanks beforehand,

CPU: Intel Atom N270
GPU: Intel Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller

libdrm2                     2.3.1-2
mesa                        7.0.3-7
xserver-xorg-core           1.4.2-10
xserver-xorg-video-intel    2.3.2-2+lenny8

Kernel built with:

    gcc (Debian 4.3.2-1.1) 4.3.2

P.S. juding by `X -verbose` difference, the freeze is happening nearby some graphics related initialization:

--- Xverbose-   2011-04-26 00:19:50.000000000 +0400
+++ Xverbose-   2011-04-26 00:08:47.000000000 +0400
@@ -7,7 +7,7 @@
 Release Date: 11 June 2008
 X Protocol Version 11, Revision 0
 Build Operating System: Linux Debian (xorg-server 2:1.4.2-10.lenny3)
-Current Operating System: Linux navy3 #1 PREEMPT Mon Apr 25 23:43:58 MSD 2011 i686
+Current Operating System: Linux navy3 #1 PREEMPT Mon Apr 25 23:21:25 MSD 2011 i686
 Build Date: 25 September 2010  12:05:44PM

         Before reporting problems, check http://wiki.x.org

@@ -297,50 +297,5 @@
 (II) intel(0):   Output TV is connected to pipe none
 (II) intel(0): [drm] dma control initialized, using IRQ 16
 (II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message.
-(II) intel(0): Selecting standard 18 bit TMDS pixel format.
-(II) intel(0): DPMS enabled
-(II) intel(0): Set up textured video
-(II) intel(0): Set up overlay video
-(II) intel(0): direct rendering: Enabled
-(WW) intel(0): Option "passwordFile" is not used
-(--) RandR disabled
-(WW) AIGLX: 3D driver claims to not support visual 0x23
-(WW) AIGLX: 3D driver claims to not support visual 0x24
-(WW) AIGLX: 3D driver claims to not support visual 0x25
-(WW) AIGLX: 3D driver claims to not support visual 0x26
-(WW) AIGLX: 3D driver claims to not support visual 0x27
-(WW) AIGLX: 3D driver claims to not support visual 0x28
-(WW) AIGLX: 3D driver claims to not support visual 0x29
-(WW) AIGLX: 3D driver claims to not support visual 0x2a
-(WW) AIGLX: 3D driver claims to not support visual 0x2b
-(WW) AIGLX: 3D driver claims to not support visual 0x2c
-(WW) AIGLX: 3D driver claims to not support visual 0x2d
-(WW) AIGLX: 3D driver claims to not support visual 0x2e
-(WW) AIGLX: 3D driver claims to not support visual 0x2f
-(WW) AIGLX: 3D driver claims to not support visual 0x30
-(WW) AIGLX: 3D driver claims to not support visual 0x31
-(WW) AIGLX: 3D driver claims to not support visual 0x32
-(II) AIGLX: Loaded and initialized /usr/lib/dri/i915_dri.so
-(II) GLX: Initialized DRI GL provider for screen 0
-(II) intel(0): Setting screen physical size to 338 x 270
-(**) Generic Keyboard: always reports core events
-(**) Generic Keyboard: Protocol: standard
-(**) Generic Keyboard: XkbRules: "xorg"
-(**) Generic Keyboard: XkbModel: "pc105"
-(**) Generic Keyboard: XkbLayout: "us,ru"
-(**) Generic Keyboard: XkbOptions: "grp:ctrl_shift_toggle,grp_led:scroll"
-(**) Generic Keyboard: CustomKeycodes disabled
-(**) Configured Mouse: Device: "/dev/input/mice"
-(**) Configured Mouse: Protocol: "ImPS/2"
-(**) Configured Mouse: always reports core events
-(==) Configured Mouse: Emulate3Buttons, Emulate3Timeout: 50
-(**) Configured Mouse: ZAxisMapping: buttons 4 and 5
-(**) Configured Mouse: Buttons: 9
-(**) Configured Mouse: Sensitivity: 1
-(II) evaluating device (Generic Keyboard)
-(II) XINPUT: Adding extended input device "Generic Keyboard" (type: KEYBOARD)
-(II) evaluating device (Configured Mouse)
-(II) XINPUT: Adding extended input device "Configured Mouse" (type: MOUSE)
-(II) Configured Mouse: ps2EnableDataReporting: succeeded

-# X is up and running ok
+# no new messages here, the machine seems to be frozen
Comment 2 Kirill Smelkov 2011-04-25 20:38:17 UTC
Created attachment 55432 [details]
Comment 3 Kirill Smelkov 2011-04-25 20:39:05 UTC
Created attachment 55442 [details]
Comment 4 Kirill Smelkov 2011-04-25 20:39:43 UTC
Created attachment 55452 [details]
Comment 5 Kirill Smelkov 2011-04-25 20:40:12 UTC
Created attachment 55462 [details]
Comment 6 Kirill Smelkov 2011-04-25 20:40:45 UTC
Created attachment 55472 [details]
Comment 7 Kirill Smelkov 2011-04-25 20:41:27 UTC
Created attachment 55482 [details]
Comment 8 Kirill Smelkov 2011-04-25 20:45:46 UTC
Forgot to mention: both 2.6.37 and 2.6.38.{from 1 to 4} are working OK on another Debian Lenny based machine with similiar software setup, but with different hardware (including more modern graphics:

    +-02.0  Intel Corporation 4 Series Chipset Integrated Graphics Controller
    +-02.1  Intel Corporation 4 Series Chipset Integrated Graphics Controller

) and X using VESA gfx driver...
Comment 9 Kirill Smelkov 2011-05-05 14:41:13 UTC
Using netconsole I see the following BUG on

[   50.763450] BUG: unable to handle kernel  NULL pointer dereference  at 00000084
[   50.763478] IP:  [<c11fa50a>] i915_driver_irq_handler+0x12a/0xab0
[   50.763501] *pde = 00000000  
[   50.763511] Oops: 0000 [#1]  PREEMPT  
[   50.763522] last sysfs file: /sys/devices/virtual/dmi/id/board_asset_tag
[   50.763530] Modules linked in:  netconsole  3c59x  bttv  v4l2_common  videodev  videobuf_dma_sg  videobuf_core  btcx_risc  tveeprom  [last unloaded: scsi_wait_scan] 
[   50.763573] 
[   50.763581] Pid: 3562, comm: Xorg Not tainted #1   ICP / iEi PCISA-945GSE / PCISA-945GSE(B125) 
[   50.763604] EIP: 0060:[<c11fa50a>] EFLAGS: 00213082 CPU: 0
[   50.763613] EIP is at i915_driver_irq_handler+0x12a/0xab0
[   50.763620] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: dffe1898
[   50.763628] ESI: 00000000 EDI: de530000 EBP: 00203013 ESP: dec09f4c
[   50.763635]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[   50.763646] Process Xorg (pid: 3562, ti=dec08000 task=dec2e040 task.ti=def6a000)
[   50.763652] Stack:
[   50.763657]  00000002  c142c8e6  c139c2a4  c142d614  c123747c  de030400  de02c400  c12374ce
Comment 10 Kirill Smelkov 2011-05-06 10:34:28 UTC
I'v bisected this to

commit e8616b6ced6137085e6657cc63bc2fe3900b8616
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jan 20 09:57:11 2011 +0000

    drm/i915: Initialise ring vfuncs for old DRI paths
    We weren't setting up the vfunc table when initialising the old DRI
    ringbuffer, leading to such OOPSes as:
    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [<(null)>] (null)
    PGD 10c441067 PUD 1185e5067 PMD 0
    Oops: 0010 [#1] PREEMPT SMP
    last sysfs file: /sys/class/dmi/id/chassis_asset_tag
    CPU 3
    Modules linked in: i915 drm_kms_helper drm fb fbdev i2c_algo_bit
    cfbcopyarea video backlight output cfbimgblt cfbfillrect autofs4 ipv6
    nfs lockd fscache nfs_acl auth_rpcgss sunrpc coretemp hwmon_vid mousedev
    usbhid hid option usb_wwan snd_hda_codec_via asus_atk0110 atl1e
    usbserial snd_hda_intel snd_hda_codec firmware_class snd_hwdep snd_pcm
    snd_seq snd_timer snd_seq_device processor parport_pc thermal snd
    thermal_sys parport 8250_pnp button rng_core rtc_cmos shpchp hwmon
    rtc_core ehci_hcd pci_hotplug uhci_hcd soundcore tpm_tis i2c_i801
    rtc_lib tpm serio_raw snd_page_alloc tpm_bios i2c_core usbcore psmouse
    intel_agp sg pcspkr sr_mod evdev cdrom ext3 jbd mbcache dm_mod sd_mod
    ata_piix libata scsi_mod unix
    Jan 18 15:49:29 lithui kernel:
    Pid: 3605, comm: Xorg Not tainted #5 P5KPL-CM/System Product
    RIP: 0010:[<0000000000000000>]  [<(null)>] (null)
    RSP: 0018:ffff8801150d1d40  EFLAGS: 00010202
    RAX: 000000000001ffff RBX: ffff88011a011b00 RCX: 000000000001a704
    RDX: ffff880118566028 RSI: ffff880118566028 RDI: ffff880117876800
    RBP: ffff8801150d1d48 R08: ffff8801195fe300 R09: 00000000c0086444
    R10: 0000000000000001 R11: 0000000000003206 R12: ffff880117876800
    R13: ffff880118566000 R14: ffff880117876820 R15: ffff8801150d1df8
    FS:  00007f1038d456e0(0000) GS:ffff880001780000(0000)
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 00000001187e7000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process Xorg (pid: 3605, threadinfo ffff8801150d0000, task
    ffffffffa043b8e6 ffff8801150d1d98 ffffffffa041768b dead000000000000
    <0> 0000000000000048 00007f1023f2a000 0000000000000044 0000000000000008
    <0> ffff88010d26bd80 ffff880117876800 ffff8801150d1df8 ffff8801150d1ea8
    Call Trace:
    [<ffffffffa043b8e6>] ? intel_ring_advance+0x16/0x20 [i915]
    [<ffffffffa041768b>] i915_irq_emit+0x15b/0x240 [i915]
    [<ffffffffa03ea7b1>] drm_ioctl+0x1f1/0x460 [drm]
    [<ffffffffa0417530>] ? i915_irq_emit+0x0/0x240 [i915]
    [<ffffffff810dd8f1>] ? do_sync_read+0xd1/0x120
    [<ffffffff81025b1f>] ? do_page_fault+0x1df/0x3d0
    [<ffffffff810ed5c7>] do_vfs_ioctl+0x97/0x550
    [<ffffffff8115c2ea>] ? security_file_permission+0x7a/0x90
    [<ffffffff810edb19>] sys_ioctl+0x99/0xa0
    [<ffffffff810024ab>] system_call_fastpath+0x16/0x1b
    Code:  Bad RIP value.
    RIP  [<(null)>] (null)
    RSP <ffff8801150d1d40>
    CR2: 0000000000000000
    Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
    Tested-by: Herbert Xu <herbert@gondor.apana.org.au>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29153
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23172
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: stable@kernel.org
Comment 11 Kirill Smelkov 2011-05-06 10:46:41 UTC
Here is how it BUGs with relevant disassembly (kernel compiled with debug info):

[   92.113090] BUG: unable to handle kernel  NULL pointer dereference  at 00000084
[   92.113115] IP:  [<c11efb2f>] i915_driver_irq_handler+0x11f/0xa70
[   92.113136] *pde = 00000000  
[   92.113145] Oops: 0000 [#1]  PREEMPT  
[   92.113157] last sysfs file: /sys/devices/virtual/dmi/id/board_asset_tag
[   92.113166] Modules linked in: 
[   92.113175] 
[   92.113184] Pid: 0, comm: swapper Not tainted 2.6.37--NAVY-08012-ge8616b6 #23 PCISA-945GSE(B125)/PCISA-945GSE
[   92.113194] EIP: 0060:[<c11efb2f>] EFLAGS: 00010082 CPU: 0
[   92.113203] EIP is at i915_driver_irq_handler+0x11f/0xa70
[   92.113211] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: dffe1898
[   92.113219] ESI: ded38000 EDI: 00000000 EBP: dec09fc0 ESP: dec09f3c
[   92.113227]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[   92.113236] Process swapper (pid: 0, ti=dec08000 task=c1464340 task.ti=c145e000)
[   92.113242] Stack:
[   92.113248]  00000002  c140d160  c1395d0d  c140df49  de458e00  dec09f64  c12335ef  00004000 
[   92.113276]  de457400  de458e00  dec09f90  c123c958  dec09f84  de458fbc  ded382c4  ded3801c 

c11efa10 <i915_driver_irq_handler>:
                intel_prepare_page_flip(dev, intel_crtc->plane);

irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS)


                ret = IRQ_HANDLED;

                /* Consume port.  Then clear IIR or we'll miss events */
                if ((I915_HAS_HOTPLUG(dev)) &&
c11efadd:       8b 55 c0                mov    -0x40(%ebp),%edx
c11efae0:       8b 82 f4 01 00 00       mov    0x1f4(%edx),%eax
c11efae6:       8b 40 04                mov    0x4(%eax),%eax
c11efae9:       f6 40 02 10             testb  $0x10,0x2(%eax)
c11efaed:       74 0c                   je     c11efafb <i915_driver_irq_handler+0xeb>
c11efaef:       f7 c7 00 00 02 00       test   $0x20000,%edi
c11efaf5:       0f 85 1d 02 00 00       jne    c11efd18 <i915_driver_irq_handler+0x308>
c11efafb:       8b 46 10                mov    0x10(%esi),%eax
c11efafe:       05 a4 20 00 00          add    $0x20a4,%eax
c11efb03:       89 38                   mov    %edi,(%eax)
{ asm volatile("mov" size " %0,%1": :reg (val), \
"m" (*(volatile type __force *)addr) barrier); }

build_mmio_read(readb, "b", unsigned char, "=q", :"memory")
build_mmio_read(readw, "w", unsigned short, "=r", :"memory")
build_mmio_read(readl, "l", unsigned int, "=r", :"memory")
c11efb05:       8b 46 10                mov    0x10(%esi),%eax
c11efb08:       05 a4 20 00 00          add    $0x20a4,%eax
c11efb0d:       8b 18                   mov    (%eax),%ebx

                I915_WRITE(IIR, iir);
                new_iir = I915_READ(IIR); /* Flush posted writes */

                if (dev->primary->master) {
c11efb0f:       8b 55 c0                mov    -0x40(%ebp),%edx
c11efb12:       8b 82 20 02 00 00       mov    0x220(%edx),%eax
c11efb18:       8b 80 e4 00 00 00       mov    0xe4(%eax),%eax
c11efb1e:       85 c0                   test   %eax,%eax
c11efb20:       74 19                   je     c11efb3b <i915_driver_irq_handler+0x12b>
                        master_priv = dev->primary->master->driver_priv;
                        if (master_priv->sarea_priv)
c11efb22:       8b 40 5c                mov    0x5c(%eax),%eax
c11efb25:       8b 50 04                mov    0x4(%eax),%edx
c11efb28:       85 d2                   test   %edx,%edx
c11efb2a:       74 0f                   je     c11efb3b <i915_driver_irq_handler+0x12b>
                                master_priv->sarea_priv->last_dispatch =
c11efb2c:       8b 46 4c                mov    0x4c(%esi),%eax
c11efb2f:       8b 80 84 00 00 00       mov    0x84(%eax),%eax
c11efb35:       89 82 08 08 00 00       mov    %eax,0x808(%edx)

                if (iir & I915_USER_INTERRUPT)
c11efb3b:       f7 c7 02 00 00 00       test   $0x2,%edi
c11efb41:       0f 85 b9 01 00 00       jne    c11efd00 <i915_driver_irq_handler+0x2f0>
Comment 12 Rafael J. Wysocki 2011-05-14 22:20:18 UTC
First-Bad-Commit : e8616b6ced6137085e6657cc63bc2fe3900b8616
Comment 13 Kirill Smelkov 2011-05-20 16:08:46 UTC
With v2.6.39 and the same userspace setup it does not panic, but X refuses to start at all:

    (II) intel(0): Selecting standard 18 bit TMDS pixel format.
    (II) intel(0): Output configuration:
    (II) intel(0):   Pipe A is on
    (II) intel(0):   Display plane A is now enabled and connected to pipe A.
    (II) intel(0):   Pipe B is on
    (II) intel(0):   Display plane B is now enabled and connected to pipe B.
    (II) intel(0):   Output VGA is connected to pipe A
    (II) intel(0):   Output LVDS is connected to pipe B
    (II) intel(0):   Output TV is connected to pipe none
    (EE) intel(0): [drm] failure adding irq handler
    (II) intel(0): [drm] removed 1 reserved context for kernel
    (II) intel(0): [drm] unmapping 8192 bytes of SAREA 0xdfff7000 at 0xb7252000
    (II) intel(0): [drm] Closed DRM master.

    Fatal server error:
    AddScreen/ScreenInit failed for driver 0

with no messages in dmesg.
Comment 14 Kirill Smelkov 2011-05-20 16:28:46 UTC
What's going on? Are we breaking more and more stuff with each release? Also I thought there is "no regressions" rule, but even detailed bisected bugreports stay without reply...
Comment 15 Luke-Jr 2011-05-20 17:03:55 UTC
Elevating to blocking, since this basically makes the system useless... I no longer have my original Gigabyte motherboard, and will test against my new motherboard when I am forced to reboot it again.
Comment 16 Luke-Jr 2011-05-21 19:15:40 UTC
As I mentioned, I no longer have this motherboard. Furthermore, Chris Wilson explains that Kirill Smellkov's details/bisection are for an unrelated bug. Therefore, unless someone else can reproduce this bug, it should probably be RESOLVED UNREPRODUCABLE.
Comment 17 Kirill Smelkov 2011-05-28 10:39:55 UTC
Luke, could you please point me, where Chris Wilson "explains that Kirill Smellkov's details/bisection are for an unrelated bug"?

Yes, my bug is maybe different from you original problem - that's my fault, but still I can't see any reply from Chris here.

Anyway, I've created new separate bugzilla entry for NULL pointer dereference in i915_driver_irq_handler on X startup:

    bug36052  "System hang during X startup (non-kms, regression, bisected)"
Comment 18 Luke-Jr 2011-05-28 12:37:13 UTC
Comment 19 Kirill Smelkov 2011-05-28 14:02:59 UTC