Bug 15327

Summary: [855GM] random freezes
Product: Drivers Reporter: Vasyl Demin (vasyl.demin)
Component: Video(DRI - Intel)Assignee: Jesse Barnes (jbarnes)
Status: CLOSED CODE_FIX    
Severity: normal CC: chris, jbarnes, maciej.rutecki, rjw, vasyl.demin
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32.8 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14885    

Description Vasyl Demin 2010-02-16 18:31:49 UTC
Laptop: HP Compaq nx9020
Display controller: Intel Corporation 82852/855GM Integrated Graphics
Device (rev 02)

OS: Arch Linux i686
kernel-2.6.32.8 with patch from http://bugzilla.kernel.org/show_bug.cgi?id=14957
xorg-server-1.7.4.901
xf86-video-intel-2.10.0
mesa-7.7

By "freeze" I mean that it's not possible to move the mouse pointer or use the keyboard. Kernel log message:

Feb 16 20:38:01 takron kernel: ------------[ cut here ]------------
Feb 16 20:38:01 takron kernel: kernel BUG at drivers/gpu/drm/i915/i915_gem.c:2108!
Feb 16 20:38:01 takron kernel: invalid opcode: 0000 [#1] PREEMPT SMP 
Feb 16 20:38:01 takron kernel: last sysfs file: /sys/devices/virtual/hwmon/hwmon0/temp1_input
Feb 16 20:38:01 takron kernel: Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc michael_mic arc4 ecb lib80211_crypt_tkip ipv6 ext2 pcmcia snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device yenta_socket rsrc_nonstatic pcmcia_core snd_pcm_oss snd_mixer_oss 8139too mii ipw2200 libipw lib80211 snd_intel8x0m joydev snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer iTCO_wdt iTCO_vendor_support psmouse snd fuse uhci_hcd shpchp container soundcore snd_page_alloc wmi ac battery ehci_hcd i2c_i801 sg processor pci_hotplug thermal usbcore evdev serio_raw vboxdrv rtc_cmos rtc_core rtc_lib ext4 mbcache jbd2 crc16 sr_mod sd_mod cdrom ata_piix ata_generic pata_acpi libata scsi_mod i915 drm_kms_helper drm i2c_algo_bit button i2c_core video output intel_agp agpgart
Feb 16 20:38:01 takron kernel: 
Feb 16 20:38:01 takron kernel: Pid: 4819, comm: X Not tainted (2.6.32-ARCH #1) compaq nx9020 (PG711ES#ABB)       
Feb 16 20:38:01 takron kernel: EIP: 0060:[<ee8af465>] EFLAGS: 00213246 CPU: 0
Feb 16 20:38:01 takron kernel: EIP is at i915_gem_evict_everything+0xe5/0x120 [i915]
Feb 16 20:38:01 takron kernel: EAX: ed2f2000 EBX: ed315000 ECX: 00000000 EDX: 0000fdfd
Feb 16 20:38:01 takron kernel: ESI: ed399400 EDI: ed315e0c EBP: ed315e20 ESP: ed2f3da4
Feb 16 20:38:01 takron kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Feb 16 20:38:01 takron kernel: Process X (pid: 4819, ti=ed2f2000 task=c21b0c90 task.ti=ed2f2000)
Feb 16 20:38:01 takron kernel: Stack:
Feb 16 20:38:01 takron kernel: 00000000 0000000a c02e4480 0000000a c02e49c0 ee8b0a80 00203292 ed278184
Feb 16 20:38:01 takron kernel: <0> 00203246 ed278000 00203286 00000001 c129a088 d6c53300 ed315000 c12b8a9c
Feb 16 20:38:01 takron kernel: <0> de3b9940 c2381800 c22268a0 d12bf380 ed315000 c02e4480 de3b9800 c2381800
Feb 16 20:38:01 takron kernel: Call Trace:
Feb 16 20:38:01 takron kernel: [<ee8b0a80>] ? i915_gem_execbuffer+0x830/0x1310 [i915]
Feb 16 20:38:01 takron kernel: [<c129a088>] ? unix_stream_recvmsg+0x218/0x540
Feb 16 20:38:01 takron kernel: [<c12b8a9c>] ? __mutex_lock_slowpath+0x1ec/0x2c0
Feb 16 20:38:01 takron kernel: [<ee75b298>] ? drm_ioctl+0x158/0x320 [drm]
Feb 16 20:38:01 takron kernel: [<ee8b0250>] ? i915_gem_execbuffer+0x0/0x1310 [i915]
Feb 16 20:38:01 takron kernel: [<c10e3c95>] ? do_sync_read+0xd5/0x120
Feb 16 20:38:01 takron kernel: [<c10f1d29>] ? vfs_ioctl+0x89/0xa0
Feb 16 20:38:01 takron kernel: [<c10f1ea9>] ? do_vfs_ioctl+0x79/0x5c0
Feb 16 20:38:01 takron kernel: [<ee8b18b0>] ? i915_gem_fault+0x0/0x150 [i915]
Feb 16 20:38:01 takron kernel: [<c10e3d46>] ? rw_verify_area+0x66/0xe0
Feb 16 20:38:01 takron kernel: [<c1064a30>] ? ktime_get_ts+0xd0/0x100
Feb 16 20:38:01 takron kernel: [<c10f2466>] ? sys_ioctl+0x76/0x90
Feb 16 20:38:01 takron kernel: [<c10039f3>] ? sysenter_do_call+0x12/0x28
Feb 16 20:38:01 takron kernel: Code: c0 89 c1 75 a6 89 f0 e8 ca fa ff ff 85 c0 89 c1 75 99 89 f8 89 0c 24 e8 6a ab a0 d2 3b ab 20 0e 00 00 74 0b 89 f8 e8 3b ae a0 d2 <0f> 0b eb fe 8d 83 18 0e 00 00 39 83 18 0e 00 00 75 e7 8d 83 10 
Feb 16 20:38:01 takron kernel: EIP: [<ee8af465>] i915_gem_evict_everything+0xe5/0x120 [i915] SS:ESP 0068:ed2f3da4
Feb 16 20:38:01 takron kernel: ---[ end trace d4b77122adeb6f75 ]---
Comment 1 Jesse Barnes 2010-02-19 20:40:14 UTC
Can you test the drm-intel-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel.git?
Comment 2 Vasyl Demin 2010-02-19 21:17:29 UTC
Ok, I'll try it
Comment 3 Rafael J. Wysocki 2010-02-22 21:39:05 UTC
BTW, what was the last working kernel?
Comment 4 Vasyl Demin 2010-02-23 11:09:25 UTC
2.6.32.7
Comment 5 Rafael J. Wysocki 2010-02-23 19:42:03 UTC
Well, it should be pretty straightforward to carry out bisection of commits between 2.6.32.7 and 2.6.32.8.  Can you please try that?
Comment 6 Chris Wilson 2010-02-23 19:53:03 UTC
Interaction with userspace would muddle the bisection. This bug should be fixed by:

commit 99fcb766a3a50466fe31d743260a3400c1aee855
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sun Feb 7 16:20:18 2010 +0100

    drm/i915: Update write_domains on active list after flush.
    
    Before changing the status of a buffer with a pending write we will await
    upon a new flush for that buffer. So we can take advantage of any flushes
    posted whilst the buffer is active and pending processing by the GPU, by
    clearing its write_domain and updating its last_rendering_seqno -- thus
    saving a potential flush in deep queues and improves flushing behaviour
    upon eviction for both GTT space and fences.
    
    In order to reduce the time spent searching the active list for matching
    write_domains, we move those to a separate list whose elements are
    the buffers belong to the active/flushing list with pending writes.
    
    Orignal patch by Chris Wilson <chris@chris-wilson.co.uk>, forward-ported
    by me.
    
    In addition to better performance, this also fixes a real bug. Before
    this changes, i915_gem_evict_everything didn't work as advertised. When
    the gpu was actually busy and processing request, the flush and subsequent
    wait would not move active and dirty buffers to the inactive list, but
    just to the flushing list. Which triggered the BUG_ON at the end of this
    function. With the more tight dirty buffer tracking, all currently busy and
    dirty buffers get moved to the inactive list by one i915_gem_flush operation.
    
    I've left the BUG_ON I've used to prove this in there.
    
    References:
      Bug 25911 - 2.10.0 causes kernel oops and system hangs
      http://bugs.freedesktop.org/show_bug.cgi?id=25911
    
      Bug 26101 - [i915] xf86-video-intel 2.10.0 (and git) triggers kernel oops
                  within seconds after login
      http://bugs.freedesktop.org/show_bug.cgi?id=26101
    
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Tested-by: Adam Lantos <hege@playma.org>
    Cc: stable@kernel.org
    Signed-off-by: Eric Anholt <eric@anholt.net>
Comment 7 Jesse Barnes 2010-02-26 23:03:06 UTC
Ok closing per Chris's add.