Trying to boot my system with Linux 2.6.38 results in it hanging during KDE's first step (out of 5). The hard drives do not sync, and my screen freezes with the KDE startup, so I have no useful logs. Everything works fine in 2.6.37. CPU: Intel i5-2400 GPU: Intel 2nd Gen HD Graphics 2000 Motherboard: Gigabyte GA-H67A-UD3H (*with* SATA problem, will be returned soon for *different model*) GPU: ATi Radeon 5850 (no drivers loaded at all) xf86-video-intel: 2.15.0 mesa: 7.10.2[-r1] xorg-server: 1.9.4
Too observing similiar X freeze on startup here on Debian Lenny based system: X tries to start, monitor backlight blinks one time, and then whole system freeze - display is black, no reaction to keyboard (sysrq not working) and no network (i.e. ping from other host is not working). It used to work with 2.6.37.6 and earlier kernels without problem. Attaching dmesg and X logs for good and bad cases taken over ssh from nearby machine. Thanks beforehand, Kirill CPU: Intel Atom N270 GPU: Intel Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller MB: iEi PCISA-945GSE libdrm2 2.3.1-2 mesa 7.0.3-7 xserver-xorg-core 1.4.2-10 xserver-xorg-video-intel 2.3.2-2+lenny8 Kernel built with: gcc (Debian 4.3.2-1.1) 4.3.2 P.S. juding by `X -verbose` difference, the freeze is happening nearby some graphics related initialization: --- Xverbose-2.6.37.6 2011-04-26 00:19:50.000000000 +0400 +++ Xverbose-2.6.38.4 2011-04-26 00:08:47.000000000 +0400 @@ -7,7 +7,7 @@ Release Date: 11 June 2008 X Protocol Version 11, Revision 0 Build Operating System: Linux Debian (xorg-server 2:1.4.2-10.lenny3) -Current Operating System: Linux navy3 2.6.37.6--NAVY-06233-gccce389 #1 PREEMPT Mon Apr 25 23:43:58 MSD 2011 i686 +Current Operating System: Linux navy3 2.6.38.4--NAVY-06833-g4878331 #1 PREEMPT Mon Apr 25 23:21:25 MSD 2011 i686 Build Date: 25 September 2010 12:05:44PM Before reporting problems, check http://wiki.x.org @@ -297,50 +297,5 @@ (II) intel(0): Output TV is connected to pipe none (II) intel(0): [drm] dma control initialized, using IRQ 16 (II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message. -(II) intel(0): Selecting standard 18 bit TMDS pixel format. -(II) intel(0): DPMS enabled -(II) intel(0): Set up textured video -(II) intel(0): Set up overlay video -(II) intel(0): direct rendering: Enabled -(WW) intel(0): Option "passwordFile" is not used -(--) RandR disabled -(WW) AIGLX: 3D driver claims to not support visual 0x23 -(WW) AIGLX: 3D driver claims to not support visual 0x24 -(WW) AIGLX: 3D driver claims to not support visual 0x25 -(WW) AIGLX: 3D driver claims to not support visual 0x26 -(WW) AIGLX: 3D driver claims to not support visual 0x27 -(WW) AIGLX: 3D driver claims to not support visual 0x28 -(WW) AIGLX: 3D driver claims to not support visual 0x29 -(WW) AIGLX: 3D driver claims to not support visual 0x2a -(WW) AIGLX: 3D driver claims to not support visual 0x2b -(WW) AIGLX: 3D driver claims to not support visual 0x2c -(WW) AIGLX: 3D driver claims to not support visual 0x2d -(WW) AIGLX: 3D driver claims to not support visual 0x2e -(WW) AIGLX: 3D driver claims to not support visual 0x2f -(WW) AIGLX: 3D driver claims to not support visual 0x30 -(WW) AIGLX: 3D driver claims to not support visual 0x31 -(WW) AIGLX: 3D driver claims to not support visual 0x32 -(II) AIGLX: Loaded and initialized /usr/lib/dri/i915_dri.so -(II) GLX: Initialized DRI GL provider for screen 0 -(II) intel(0): Setting screen physical size to 338 x 270 -(**) Generic Keyboard: always reports core events -(**) Generic Keyboard: Protocol: standard -(**) Generic Keyboard: XkbRules: "xorg" -(**) Generic Keyboard: XkbModel: "pc105" -(**) Generic Keyboard: XkbLayout: "us,ru" -(**) Generic Keyboard: XkbOptions: "grp:ctrl_shift_toggle,grp_led:scroll" -(**) Generic Keyboard: CustomKeycodes disabled -(**) Configured Mouse: Device: "/dev/input/mice" -(**) Configured Mouse: Protocol: "ImPS/2" -(**) Configured Mouse: always reports core events -(==) Configured Mouse: Emulate3Buttons, Emulate3Timeout: 50 -(**) Configured Mouse: ZAxisMapping: buttons 4 and 5 -(**) Configured Mouse: Buttons: 9 -(**) Configured Mouse: Sensitivity: 1 -(II) evaluating device (Generic Keyboard) -(II) XINPUT: Adding extended input device "Generic Keyboard" (type: KEYBOARD) -(II) evaluating device (Configured Mouse) -(II) XINPUT: Adding extended input device "Configured Mouse" (type: MOUSE) -(II) Configured Mouse: ps2EnableDataReporting: succeeded -# X is up and running ok +# no new messages here, the machine seems to be frozen
Created attachment 55432 [details] dmesg-2.6.37.6
Created attachment 55442 [details] dmesg-2.6.38.4
Created attachment 55452 [details] Xverbose-2.6.37.6
Created attachment 55462 [details] Xverbose-2.6.38.4
Created attachment 55472 [details] Xorglog-2.6.37.6
Created attachment 55482 [details] Xorglog-2.6.38.4
Forgot to mention: both 2.6.37 and 2.6.38.{from 1 to 4} are working OK on another Debian Lenny based machine with similiar software setup, but with different hardware (including more modern graphics: +-02.0 Intel Corporation 4 Series Chipset Integrated Graphics Controller +-02.1 Intel Corporation 4 Series Chipset Integrated Graphics Controller ) and X using VESA gfx driver...
Using netconsole I see the following BUG on 2.6.38.5: [ 50.763450] BUG: unable to handle kernel NULL pointer dereference at 00000084 [ 50.763478] IP: [<c11fa50a>] i915_driver_irq_handler+0x12a/0xab0 [ 50.763501] *pde = 00000000 [ 50.763511] Oops: 0000 [#1] PREEMPT [ 50.763522] last sysfs file: /sys/devices/virtual/dmi/id/board_asset_tag [ 50.763530] Modules linked in: netconsole 3c59x bttv v4l2_common videodev videobuf_dma_sg videobuf_core btcx_risc tveeprom [last unloaded: scsi_wait_scan] [ 50.763573] [ 50.763581] Pid: 3562, comm: Xorg Not tainted 2.6.38.5--NAVY-06838-g355816e-dirty #1 ICP / iEi PCISA-945GSE / PCISA-945GSE(B125) [ 50.763604] EIP: 0060:[<c11fa50a>] EFLAGS: 00213082 CPU: 0 [ 50.763613] EIP is at i915_driver_irq_handler+0x12a/0xab0 [ 50.763620] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: dffe1898 [ 50.763628] ESI: 00000000 EDI: de530000 EBP: 00203013 ESP: dec09f4c [ 50.763635] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 50.763646] Process Xorg (pid: 3562, ti=dec08000 task=dec2e040 task.ti=def6a000) [ 50.763652] Stack: [ 50.763657] 00000002 c142c8e6 c139c2a4 c142d614 c123747c de030400 de02c400 c12374ce
I'v bisected this to commit e8616b6ced6137085e6657cc63bc2fe3900b8616 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jan 20 09:57:11 2011 +0000 drm/i915: Initialise ring vfuncs for old DRI paths We weren't setting up the vfunc table when initialising the old DRI ringbuffer, leading to such OOPSes as: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<(null)>] (null) PGD 10c441067 PUD 1185e5067 PMD 0 Oops: 0010 [#1] PREEMPT SMP last sysfs file: /sys/class/dmi/id/chassis_asset_tag CPU 3 Modules linked in: i915 drm_kms_helper drm fb fbdev i2c_algo_bit cfbcopyarea video backlight output cfbimgblt cfbfillrect autofs4 ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc coretemp hwmon_vid mousedev usbhid hid option usb_wwan snd_hda_codec_via asus_atk0110 atl1e usbserial snd_hda_intel snd_hda_codec firmware_class snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device processor parport_pc thermal snd thermal_sys parport 8250_pnp button rng_core rtc_cmos shpchp hwmon rtc_core ehci_hcd pci_hotplug uhci_hcd soundcore tpm_tis i2c_i801 rtc_lib tpm serio_raw snd_page_alloc tpm_bios i2c_core usbcore psmouse intel_agp sg pcspkr sr_mod evdev cdrom ext3 jbd mbcache dm_mod sd_mod ata_piix libata scsi_mod unix Jan 18 15:49:29 lithui kernel: Pid: 3605, comm: Xorg Not tainted 2.6.36.2 #5 P5KPL-CM/System Product Name RIP: 0010:[<0000000000000000>] [<(null)>] (null) RSP: 0018:ffff8801150d1d40 EFLAGS: 00010202 RAX: 000000000001ffff RBX: ffff88011a011b00 RCX: 000000000001a704 RDX: ffff880118566028 RSI: ffff880118566028 RDI: ffff880117876800 RBP: ffff8801150d1d48 R08: ffff8801195fe300 R09: 00000000c0086444 R10: 0000000000000001 R11: 0000000000003206 R12: ffff880117876800 R13: ffff880118566000 R14: ffff880117876820 R15: ffff8801150d1df8 FS: 00007f1038d456e0(0000) GS:ffff880001780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001187e7000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process Xorg (pid: 3605, threadinfo ffff8801150d0000, task ffff88011b016e40) Stack: ffffffffa043b8e6 ffff8801150d1d98 ffffffffa041768b dead000000000000 <0> 0000000000000048 00007f1023f2a000 0000000000000044 0000000000000008 <0> ffff88010d26bd80 ffff880117876800 ffff8801150d1df8 ffff8801150d1ea8 Call Trace: [<ffffffffa043b8e6>] ? intel_ring_advance+0x16/0x20 [i915] [<ffffffffa041768b>] i915_irq_emit+0x15b/0x240 [i915] [<ffffffffa03ea7b1>] drm_ioctl+0x1f1/0x460 [drm] [<ffffffffa0417530>] ? i915_irq_emit+0x0/0x240 [i915] [<ffffffff810dd8f1>] ? do_sync_read+0xd1/0x120 [<ffffffff81025b1f>] ? do_page_fault+0x1df/0x3d0 [<ffffffff810ed5c7>] do_vfs_ioctl+0x97/0x550 [<ffffffff8115c2ea>] ? security_file_permission+0x7a/0x90 [<ffffffff810edb19>] sys_ioctl+0x99/0xa0 [<ffffffff810024ab>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [<(null)>] (null) RSP <ffff8801150d1d40> CR2: 0000000000000000 Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Herbert Xu <herbert@gondor.apana.org.au> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29153 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23172 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org
Here is how it BUGs with relevant disassembly (kernel compiled with debug info): [ 92.113090] BUG: unable to handle kernel NULL pointer dereference at 00000084 [ 92.113115] IP: [<c11efb2f>] i915_driver_irq_handler+0x11f/0xa70 [ 92.113136] *pde = 00000000 [ 92.113145] Oops: 0000 [#1] PREEMPT [ 92.113157] last sysfs file: /sys/devices/virtual/dmi/id/board_asset_tag [ 92.113166] Modules linked in: [ 92.113175] [ 92.113184] Pid: 0, comm: swapper Not tainted 2.6.37--NAVY-08012-ge8616b6 #23 PCISA-945GSE(B125)/PCISA-945GSE [ 92.113194] EIP: 0060:[<c11efb2f>] EFLAGS: 00010082 CPU: 0 [ 92.113203] EIP is at i915_driver_irq_handler+0x11f/0xa70 [ 92.113211] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: dffe1898 [ 92.113219] ESI: ded38000 EDI: 00000000 EBP: dec09fc0 ESP: dec09f3c [ 92.113227] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 [ 92.113236] Process swapper (pid: 0, ti=dec08000 task=c1464340 task.ti=c145e000) [ 92.113242] Stack: [ 92.113248] 00000002 c140d160 c1395d0d c140df49 de458e00 dec09f64 c12335ef 00004000 [ 92.113276] de457400 de458e00 dec09f90 c123c958 dec09f84 de458fbc ded382c4 ded3801c c11efa10 <i915_driver_irq_handler>: intel_prepare_page_flip(dev, intel_crtc->plane); } } irqreturn_t i915_driver_irq_handler(DRM_IRQ_ARGS) { . . . break; ret = IRQ_HANDLED; /* Consume port. Then clear IIR or we'll miss events */ if ((I915_HAS_HOTPLUG(dev)) && c11efadd: 8b 55 c0 mov -0x40(%ebp),%edx c11efae0: 8b 82 f4 01 00 00 mov 0x1f4(%edx),%eax c11efae6: 8b 40 04 mov 0x4(%eax),%eax c11efae9: f6 40 02 10 testb $0x10,0x2(%eax) c11efaed: 74 0c je c11efafb <i915_driver_irq_handler+0xeb> c11efaef: f7 c7 00 00 02 00 test $0x20000,%edi c11efaf5: 0f 85 1d 02 00 00 jne c11efd18 <i915_driver_irq_handler+0x308> c11efafb: 8b 46 10 mov 0x10(%esi),%eax c11efafe: 05 a4 20 00 00 add $0x20a4,%eax c11efb03: 89 38 mov %edi,(%eax) { asm volatile("mov" size " %0,%1": :reg (val), \ "m" (*(volatile type __force *)addr) barrier); } build_mmio_read(readb, "b", unsigned char, "=q", :"memory") build_mmio_read(readw, "w", unsigned short, "=r", :"memory") build_mmio_read(readl, "l", unsigned int, "=r", :"memory") c11efb05: 8b 46 10 mov 0x10(%esi),%eax c11efb08: 05 a4 20 00 00 add $0x20a4,%eax c11efb0d: 8b 18 mov (%eax),%ebx } I915_WRITE(IIR, iir); new_iir = I915_READ(IIR); /* Flush posted writes */ if (dev->primary->master) { c11efb0f: 8b 55 c0 mov -0x40(%ebp),%edx c11efb12: 8b 82 20 02 00 00 mov 0x220(%edx),%eax c11efb18: 8b 80 e4 00 00 00 mov 0xe4(%eax),%eax c11efb1e: 85 c0 test %eax,%eax c11efb20: 74 19 je c11efb3b <i915_driver_irq_handler+0x12b> master_priv = dev->primary->master->driver_priv; if (master_priv->sarea_priv) c11efb22: 8b 40 5c mov 0x5c(%eax),%eax c11efb25: 8b 50 04 mov 0x4(%eax),%edx c11efb28: 85 d2 test %edx,%edx c11efb2a: 74 0f je c11efb3b <i915_driver_irq_handler+0x12b> master_priv->sarea_priv->last_dispatch = c11efb2c: 8b 46 4c mov 0x4c(%esi),%eax c11efb2f: 8b 80 84 00 00 00 mov 0x84(%eax),%eax c11efb35: 89 82 08 08 00 00 mov %eax,0x808(%edx) READ_BREADCRUMB(dev_priv); } if (iir & I915_USER_INTERRUPT) c11efb3b: f7 c7 02 00 00 00 test $0x2,%edi c11efb41: 0f 85 b9 01 00 00 jne c11efd00 <i915_driver_irq_handler+0x2f0>
First-Bad-Commit : e8616b6ced6137085e6657cc63bc2fe3900b8616
With v2.6.39 and the same userspace setup it does not panic, but X refuses to start at all: ... (II) intel(0): Selecting standard 18 bit TMDS pixel format. (II) intel(0): Output configuration: (II) intel(0): Pipe A is on (II) intel(0): Display plane A is now enabled and connected to pipe A. (II) intel(0): Pipe B is on (II) intel(0): Display plane B is now enabled and connected to pipe B. (II) intel(0): Output VGA is connected to pipe A (II) intel(0): Output LVDS is connected to pipe B (II) intel(0): Output TV is connected to pipe none (EE) intel(0): [drm] failure adding irq handler (II) intel(0): [drm] removed 1 reserved context for kernel (II) intel(0): [drm] unmapping 8192 bytes of SAREA 0xdfff7000 at 0xb7252000 (II) intel(0): [drm] Closed DRM master. Fatal server error: AddScreen/ScreenInit failed for driver 0 with no messages in dmesg.
What's going on? Are we breaking more and more stuff with each release? Also I thought there is "no regressions" rule, but even detailed bisected bugreports stay without reply...
Elevating to blocking, since this basically makes the system useless... I no longer have my original Gigabyte motherboard, and will test against my new motherboard when I am forced to reboot it again.
As I mentioned, I no longer have this motherboard. Furthermore, Chris Wilson explains that Kirill Smellkov's details/bisection are for an unrelated bug. Therefore, unless someone else can reproduce this bug, it should probably be RESOLVED UNREPRODUCABLE.
Luke, could you please point me, where Chris Wilson "explains that Kirill Smellkov's details/bisection are for an unrelated bug"? Yes, my bug is maybe different from you original problem - that's my fault, but still I can't see any reply from Chris here. Anyway, I've created new separate bugzilla entry for NULL pointer dereference in i915_driver_irq_handler on X startup: bug36052 "System hang during X startup (non-kms, regression, bisected)"
https://lkml.org/lkml/2011/5/21/105
Thanks