Bug 14728

Summary: Graphic corruption
Product: Drivers Reporter: Kornel Lugosi (coornail)
Component: Video(DRI - Intel)Assignee: drivers_video-dri-intel (drivers_video-dri-intel)
Status: CLOSED DUPLICATE    
Severity: high CC: airlied, coornail, hsggebhardt, jbarnes, rjw, zhenyuw
Priority: P1    
Hardware: IA-64   
OS: Linux   
Kernel Version: 2.6.32 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14230    
Attachments: Kernel config

Description Kornel Lugosi 2009-12-03 21:27:08 UTC
I saw graphic corruption during boot process with kms but it was just a few lines. When I started the X server it got worse, I couldn't see anything, at one occasion it killed all my virtual terminals as well.
The error present without using kms, but not on the virtual terminal, only when I load X.

My configuration:
intel x3100
kernel: 2.6.32
intel driver: 2.9.1
xorg-server: 1.7.1

Sometimes the Xorg.0.log doesn't contain anything meaningful, but at one time it logged:
Fatal server error:
Failed to map batchbuffer: Input/output error

It usually (but not always) writes something to the console (which I forgot to write down, I'll reboot and tell, sorry).

It works flawlessly with 2.6.31.[0-6].

Attached my kernel config.
Comment 1 Kornel Lugosi 2009-12-03 21:27:48 UTC
Created attachment 24007 [details]
Kernel config
Comment 2 Jesse Barnes 2009-12-03 23:36:18 UTC
Can you bisect the failure?
Comment 3 Kornel Lugosi 2009-12-04 09:20:23 UTC
I'm going to try, but I can't promise that I'll have the time today.
Comment 4 Kornel Lugosi 2009-12-06 23:32:04 UTC
I'm sorry, but It seems like I won't have the time to debug this properly for a few more days.

Here's the Xorgs error message which is written to the console:
intel_bufmgr_gem.c:759: Error setting domain 604: Input/optput error

I also upgraded to xorg-server-1.7.3, but the bug is still present.

Anyone else managed to replicate this bug?
Comment 5 zhenyuw 2009-12-28 06:36:06 UTC
I bet dmesg will contain more info in case of failure.
Comment 6 Henry Gebhardt 2009-12-30 20:51:20 UTC
I guess I have the same bug starting with 2.6.32 and including 2.6.33-rc2 kernel. Thanks for looking into this. Some notes:

1.) I have 4GB of memory. Removing 2GB solves the problem, as suggested here:
http://bugs.freedesktop.org/show_bug.cgi?id=25510

2.) Using the "mem=3500M" kernel parameter solves it also, although I am left with only 3GB.

3.) The first bad commit is 176616814d700f19914d8509d9f65dec51a6ebf7, although the exact details of the problem have changed back and forth since then. For instance, sometimes the mouse can be moved, sometimes the screen stays black, sometimes the entire machine locks up s.t. shutting down via ACPI doesn't work any more, sometimes (but rarely) the above workarounds don't work, etc. This is the bisect message:

176616814d700f19914d8509d9f65dec51a6ebf7 is the first bad commit
commit 176616814d700f19914d8509d9f65dec51a6ebf7
Author: Zhenyu Wang <zhenyu.z.wang@intel.com>
Date:   Mon Jul 27 12:59:57 2009 +0100

    intel_agp: Use PCI DMA API correctly on chipsets new enough to have IOMMU
    
    When graphics dma remapping engine is active, we must fill
    gart table with dma address from dmar engine, as now graphics
    device access to graphics memory must go through dma remapping
    table to get real physical address.
    
    Add this support to all drivers which use intel_i915_insert_entries()
    
    Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com>
    Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

:040000 040000 2e225bb921aba5b816886d0d8ada8fd81e00e7a7 03d1ce91543e61975802c3a26df1ae5b75902b51 M      drivers


4.) When it does not work, dmesg reports the following in syslog (2.6.33-rc2):

[drm] Initialized drm 1.1.0 20060810
pci 0000:00:02.0: power state changed by ACPI to D0
pci 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
pci 0000:00:02.0: setting latency timer to 64
pci 0000:00:02.0: irq 30 for MSI/MSI-X
acpi device:01: registered as cooling_device2
input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input7
ACPI: Video Device [VID] (multi-head: yes  rom: no  post: no)
[drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
render error detected, EIR: 0x00000000
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 2 at 1)
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
render error detected, EIR: 0x00000000
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 4 at 3)
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
render error detected, EIR: 0x00000000
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 13 at 11)
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
render error detected, EIR: 0x00000000
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 20 at 17)
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
render error detected, EIR: 0x00000000
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 24 at 23)
Comment 7 Kornel Lugosi 2009-12-30 21:33:02 UTC
I'm glad that someone can reproduce my problem.

I can't remember anything like that from dmesg, but I'll build a kernel without 176616814d700f19914d8509d9f65dec51a6ebf7 and see what happens.
Comment 8 Henry Gebhardt 2009-12-31 00:26:15 UTC
One more thing I just tested. Disabling CONFIG_DMAR also solves the problem, but disabling CONFIG_DMAR_DEFAULT_ON is not sufficient with a 2.6.33-rc2 kernel. Thanks, H.
Comment 9 zhenyuw 2009-12-31 04:07:03 UTC
This is dup of bug 14627, I have attached a patch there.
Comment 10 zhenyuw 2009-12-31 04:48:39 UTC
please help to test the new patch on bug 14627. thanks.
Comment 11 Rafael J. Wysocki 2009-12-31 10:51:57 UTC

*** This bug has been marked as a duplicate of bug 14627 ***