Bug 58511

Summary: [drm:i915_gem_object_bind_to_gtt] *ERROR* Attempting to bind an object larger than the aperture
Product: Drivers Reporter: hreuver (h.reuver)
Component: Video(DRI - Intel)Assignee: intel-gfx-bugs (intel-gfx-bugs)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: chris, daniel, intel-gfx-bugs, nix.sasl
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.2.45 (vanilla) Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Kernel configuration used
Extended output from dmesg

Description hreuver 2013-05-19 13:44:10 UTC
When starting I see the following error in dmesg:

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.2.45-vanilla-1 (@) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #1 SMP Sat May 18 17:08:46 CEST 2013
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000] Disabled fast string operations
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000008f000 (usable)
[    0.000000]  BIOS-e820: 000000000008f000 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 000000007f533000 (usable)
[    0.000000]  BIOS-e820: 000000007f533000 - 000000007f53b000 (reserved)
[    0.000000]  BIOS-e820: 000000007f53b000 - 000000007f5cb000 (usable)
[    0.000000]  BIOS-e820: 000000007f5cb000 - 000000007f5cf000 (reserved)
[    0.000000]  BIOS-e820: 000000007f5cf000 - 000000007f660000 (usable)
[    0.000000]  BIOS-e820: 000000007f660000 - 000000007f6f0000 (ACPI NVS)
[    0.000000]  BIOS-e820: 000000007f6f0000 - 000000007f6f2000 (usable)
[    0.000000]  BIOS-e820: 000000007f6f2000 - 000000007f6ff000 (ACPI data)
[    0.000000]  BIOS-e820: 000000007f6ff000 - 000000007f700000 (usable)
[    0.000000]  BIOS-e820: 000000007f700000 - 0000000080000000 (reserved)
[    0.000000]  BIOS-e820: 00000000f0000000 - 0000000100000000 (reserved)
[    0.000000] Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
[    0.000000] SMBIOS 2.4 present.
[    0.000000] DMI:                  /D945GCLF2, BIOS LF94510J.86A.0099.2008.0731.0303 07/31/2008
[    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)

...
[    8.239372] agpgart-intel 0000:00:00.0: detected gtt size: 131072K total, 131072K mappable
[    8.338554] agpgart-intel 0000:00:00.0: detected 8192K stolen memory
[    8.414850] agpgart-intel 0000:00:00.0: AGP aperture is 128M @ 0x80000000
[    8.496173] [drm] Initialized drm 1.1.0 20060810
[    8.551408] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    8.632610] i915 0000:00:02.0: setting latency timer to 64
[    8.670948] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[    8.750094] [drm] Driver supports precise vblank timestamp query.
[    8.823492] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    9.654639] [drm] initialized overlay support
[    9.914095] [drm:i915_gem_object_bind_to_gtt] *ERROR* Attempting to bind an object larger than the aperture
[   10.030669] [drm:intelfb_create] *ERROR* failed to pin fb: -7
[   10.099411] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[   10.186972] loop: module loaded
...
[   33.852048] [drm:i915_gem_object_bind_to_gtt] *ERROR* Attempting to bind an object larger than the aperture
[   33.968581] [drm:intel_pipe_set_base] *ERROR* pin & fence failed
[   34.040527] [drm:drm_crtc_helper_set_config] *ERROR* failed to set mode on [CRTC:3]
[   38.016045] [drm:i915_gem_object_bind_to_gtt] *ERROR* Attempting to bind an object larger than the aperture
[   38.132715] [drm:intel_pipe_set_base] *ERROR* pin & fence failed
[   38.204545] [drm:drm_crtc_helper_set_config] *ERROR* failed to set mode on [CRTC:3]
[   42.160035] [drm:i915_gem_object_bind_to_gtt] *ERROR* Attempting to bind an object larger than the aperture
[   42.276570] [drm:intel_pipe_set_base] *ERROR* pin & fence failed
[   42.348416] [drm:drm_crtc_helper_set_config] *ERROR* failed to set mode on [CRTC:3]

This problem prevents the kernel-driver to load which is needed for X.
The configuration I used was the same with the 3.2.44 and the 3.2.45 kernel.
Only in 3.2.45 this problem occurs.

This is part of the dmesg output from the working 3.2.44 kernel:

[    8.226755] agpgart-intel 0000:00:00.0: detected 8192K stolen memory
[    8.303065] agpgart-intel 0000:00:00.0: AGP aperture is 128M @ 0x80000000
[    8.384370] [drm] Initialized drm 1.1.0 20060810
[    8.439605] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    8.520816] i915 0000:00:02.0: setting latency timer to 64
[    8.560221] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[    8.639338] [drm] Driver supports precise vblank timestamp query.
[    8.712344] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    9.546129] [drm] initialized overlay support
[    9.819177] fbcon: inteldrmfb (fb0) is primary device
[    9.922031] Console: switching to colour frame buffer device 210x65
[   10.079026] fb0: inteldrmfb frame buffer device
[   10.133227] drm: registered panic notifier
[   10.182246] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
Comment 1 hreuver 2013-05-19 14:02:36 UTC
Looks like same bug as 58151.
Comment 2 Chris Wilson 2013-05-20 11:00:18 UTC
Can you please improve the "*ERROR* Attempting to bind an object larger than the aperture" so that it includes the requested size and the aperture size so that we can see which of those values is odd?
Comment 3 hreuver 2013-05-20 12:35:20 UTC
Need to recompile kernel. Since kernel is optimized for a specific pc the drm modules were not compiled as modules and since I removed the kernelsources by mistake. Also enabled debugging for the i915/drm modules.

Sorry, this will take a little time.

Will be continued later.
Comment 4 hreuver 2013-05-20 15:20:35 UTC
Need some help here:
The kernel is configured, part of the graphics drivers.
CONFIG_AGP=y
CONFIG_AGP_INTEL=y
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=16
CONFIG_VGA_SWITCHEROO=y
CONFIG_DRM=m
CONFIG_DRM_KMS_HELPER=m
CONFIG_DRM_I915=m
CONFIG_DRM_I915_KMS=y
CONFIG_VIDEO_OUTPUT_CONTROL=y
CONFIG_FB=y
CONFIG_FIRMWARE_EDID=y
CONFIG_FB_DDC=m
CONFIG_FB_CFB_FILLRECT=m
CONFIG_FB_CFB_COPYAREA=m
CONFIG_FB_CFB_IMAGEBLIT=m
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y
CONFIG_FB_INTEL=m
CONFIG_FB_INTEL_DEBUG=y
CONFIG_FB_INTEL_I2C=y
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=m
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_DISPLAY_SUPPORT=m

Problem: module is loaded when I log in and I can't unload it since it's being used. Where can I see what memory is requested? What options I need to use and where?

I tried unloading and loading the module manually, but without result.

The fact that the system is not working does not mean the module can't be removed, unfortunately. Some service is still locking the i915 module...
Comment 5 Chris Wilson 2013-05-20 15:33:33 UTC
We need to add something like:

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b3c8abd..f0b9ee81 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2995,7 +2995,10 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 	 */
 	if (obj->base.size >
 	    (map_and_fenceable ? dev_priv->gtt.mappable_end : dev_priv->gtt.total)) {
-		DRM_ERROR("Attempting to bind an object larger than the aperture\n");
+		DRM_ERROR("Attempting to bind an object larger than the aperture: object=%ld > %s aperture=%ld\n",
+			  obj->base.size,
+			  map_and_fenceable ? "mappable" : "total",
+			  map_and_fenceable ? dev_priv->gtt.mappable_end : dev_priv->gtt.total);
 		return -E2BIG;
 	}
Comment 6 hreuver 2013-05-20 16:17:22 UTC
Compilation is running. Hope to post the results later this evening.

Rewrote the patch to:
--- drivers/gpu/drm/i915/i915_gem.c     2013-05-20 18:12:16.045666581 +0200
+++ ../linux-3.2.45-2/drivers/gpu/drm/i915/i915_gem.c   2013-05-20 18:06:12.947864729 +0200
@@ -2759,7 +2759,10 @@
         */
        if (obj->base.size >
            (map_and_fenceable ? dev_priv->mm.gtt_mappable_end : dev_priv->mm.gtt_total)) {
-               DRM_ERROR("Attempting to bind an object larger than the aperture\n");
+                DRM_ERROR("Attempting to bind an object larger than the aperture: object=%ld > %s aperture=%ld\n",
+                      obj->base.size,
+                      map_and_fenceable ? "mappable" : "total",
+                      map_and_fenceable ? dev_priv->mm.gtt_mappable_end : dev_priv->mm.gtt_total);
                return -E2BIG;
        }
Comment 7 hreuver 2013-05-20 17:06:11 UTC
With the patch the dmesg information reads like this:

[   18.886066] [drm] initialized overlay support
[   19.102610] [drm:i915_gem_object_bind_to_gtt] *ERROR* Attempting to bind an object larger than the aperture: object=7057408 > mappable aperture=0
[   19.258809] [drm:intelfb_create] *ERROR* failed to pin fb: -7
[   19.328556] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0

For completeness I'll add the used config later.
Comment 8 hreuver 2013-05-20 17:07:20 UTC
Created attachment 102101 [details]
Kernel configuration used
Comment 9 hreuver 2013-05-20 17:10:43 UTC
Created attachment 102111 [details]
Extended output from dmesg 

Just for completeness the drm debug output from dmesg.
(cmdline: BOOT_IMAGE=/boot/vmlinuz-3.2.45-vanilla-2 root=/dev/sda2 ro drm.debug=14)

Just lots of messages.
Comment 10 Daniel Vetter 2013-05-20 18:21:15 UTC
Bisect result would be rather interesting here. Also, are more recent kernels (like 3.9) also affected?
Comment 11 hreuver 2013-05-20 18:40:05 UTC
Haven used 3.9.x yet, since I haven't had the time to configure it.

I'll see if I can find some time later this week.
Comment 12 hreuver 2013-05-20 22:04:15 UTC
First results:
3.9.3 works fine
first bisection between 3.2.44 and 3.2.45 also (1st of 6 steps...).

I'll continue bisecting later this week.
Comment 13 hreuver 2013-05-21 06:03:42 UTC
Results of bisecting are a little unexpected:
 $ git bisect good
..
 $ git bisect good
..
 $ git bisect good
Bisecting: 14 revisions left to test after this (roughly 4 steps)
[40c157ba78681c45cc62dabde406b44ca3c76c2b] iucv: Fix missing msg_namelen update in  iucv_sock_recvmsg()
..
$ git bisect good
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[8431bc6fb3dc3784973cc9471197e34b16f38b3b] sparc64: Fix race in TLB batch processing.
..
$ git bisect good
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[0e6f42bb651bb61744d529a9cfe540e292fad98a] kernel/audit_tree.c: tree will leak memory when failure occurs in audit_trim_trees()
..
 $ git bisect bad
..
 $ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[03000102c151f4dab9a38aee831182df3be748d1] r8169: fix 8168evl frame padding.

Could anyone help me with a manual how to revise the last patch, just to be sure?
Comment 14 Daniel Vetter 2013-05-21 07:26:35 UTC
git bisect output is a bit confusing, but you still have one more step to go (i.e. you need to still the git bisect whether the commit 03000102c151f4da is good/bad). Then git bisect should tell you the first bad commit.
Comment 15 hreuver 2013-05-21 18:28:36 UTC
$ git bisect good
53e587aa5ca81497d0ea6e340320ec5778d1f311 is the first bad commit
commit 53e587aa5ca81497d0ea6e340320ec5778d1f311
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Nov 15 11:32:18 2012 +0000

    drm/i915: Fix detection of base of stolen memory
    
    commit e12a2d53ae45a69aea499b64f75e7222cca0f12f upstream.
    
    The routine to query the base of stolen memory was using the wrong
    registers and the wrong encodings on virtually every platform.
    
    It was not until the G33 refresh, that a PCI config register was
    introduced that explicitly said where the stolen memory was. Prior to
    865G there was not even a register that said where the end of usable
    low memory was and where the stolen memory began (or ended depending
    upon chipset). Before then, one has to look at the BIOS memory maps to
    find the Top of Memory. Alas that is not exported by arch/x86 and so we
    have to resort to disabling stolen memory on gen2 for the time being.
    
    Then SandyBridge enlarged the PCI register to a full 32-bits and change
    the encoding of the address, so even though we happened to be querying
    the right register, we read the wrong bits and ended up using address 0
    for our stolen data, i.e. notably FBC.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    [bwh: Backported to 3.2: adjust filename, context]
    Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

:040000 040000 323812cac24bd796efa71220b496ee5a2a00d298 101967a69bc453f845daf3a7a3024ebd932984dc M      drivers
Comment 16 Daniel Vetter 2013-05-21 18:48:02 UTC
Hm, that's interesting. Can you please test 3.9.x also to check whether this regression is due to backporting into 3.2.x kernels or a general issue with the upstream code?

I suspect the former since the code seems to work on most upstream systems and shouldn't even be used on 3.2 for anything at all.
Comment 17 Chris Wilson 2013-05-21 18:59:19 UTC
Oh,I see the problem now. The stray 'return 0' from moving the code from i915_gem_init_stolen to i915_load_gem_init.
Comment 18 hreuver 2013-05-21 20:09:08 UTC
See comment #12: "3.9.3 works fine".

Thanks for your support.
Comment 19 Daniel Vetter 2013-05-21 20:24:47 UTC
(In reply to comment #18)
> See comment #12: "3.9.3 works fine".

Oops, I've tried to look for it but failed to find it. I'll send a mail to the 3.2 stable maintainer asking him to revert this patch. That should take care of things.

Thanks a lot for reporting this issue.
Comment 20 Daniel Vetter 2013-05-21 20:26:22 UTC
*** Bug 58151 has been marked as a duplicate of this bug. ***
Comment 21 Daniel Vetter 2013-05-21 20:28:25 UTC
Since I'll like loose track closing. Please yell if the revert doesn't land in a 3.2.x release soon.
Comment 22 Nix\ 2013-05-23 05:45:46 UTC
3.9.2 works fine.
@Daniel Vetter, please gime the patch for test in 3.2.44 and confirm.
Comment 23 Nix\ 2013-05-23 06:41:13 UTC
@Daniel Vetter

I can confirm. Using the commit 53e587aa5ca81497d0ea6e340320ec5778d1f311 as
.patch file, the bug is solved:


[root@nix linux-3.2.45]# patch -R -p1 < patch-revert-i915.patch 
patching file drivers/gpu/drm/i915/i915_dma.c
patching file drivers/gpu/drm/i915/i915_drv.h
patch unexpectedly ends in middle of line
Hunk #1 succeeded at 581 with fuzz 1.


Boot with screen, fb, backlight control and resolution ok.