Bug 16294

Summary: [Q35 bisected] hang at init of i915 driver
Product: Drivers Reporter: Kees Cook (kees)
Component: Video(DRI - Intel)Assignee: Chris Wilson (chris)
Status: CLOSED CODE_FIX    
Severity: high CC: daniel, gordon.jin, jirislaby, leann.ogasawara, maciej.rutecki, rjw, zhenyu.z.wang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16055    
Attachments: instrumentation of i915 changes

Description Kees Cook 2010-06-25 18:20:37 UTC
commit f1befe71fa7a79ab733011b045639d8d809924ad, found via bisect, causes my system to hang at boot.  With it reverted, my system is fine.

https://launchpad.net/bugs/597075


00:02.0 0300: 8086:29b2 (rev 02)
        Subsystem: 8086:4f4a
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 47
        Region 0: Memory at e0380000 (32-bit, non-prefetchable) [size=512K]
        Region 1: I/O ports at 2458 [size=8]
        Region 2: Memory at d0000000 (32-bit, prefetchable) [size=256M]
        Region 3: Memory at e0200000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915
Comment 1 Gordon Jin 2010-06-28 06:54:29 UTC
commit f1befe71fa7a79ab733011b045639d8d809924ad
Author: Chris Wilson <email address hidden>
Date: Tue May 18 12:24:51 2010 +0100

    agp/intel: Restrict GTT mapping to valid range on i915 and i945

    References:

      Bug 15733 - Crash when accessing nonexistent GTT entries in i915
      https://bugzilla.kernel.org/show_bug.cgi?id=15733

    On G33 and above, the size of the GTT space is determined by the GMCH
    control register. Prior to this revision, the size is determined by the
    size of the aperture. So we must careful to map and fill the appropriate
    range depending on chipset.

    Signed-off-by: Chris Wilson <email address hidden>
    Signed-off-by: Eric Anholt <email address hidden>
Comment 2 Chris Wilson 2010-06-28 07:36:32 UTC
The question is why does the Q35 use the G33 driver and so uses i915_create_gatt_entries()? In all respects it seems to be a 965 part...
Comment 3 Gordon Jin 2010-06-29 02:32:18 UTC
Only G35 is 965-like, and G33/Q33/Q35 are 945-like.
Comment 4 Chris Wilson 2010-06-29 06:58:30 UTC
But do the Q33/Q35 also use the same bits as the G33 to determine the page table  size? In which case, should we just make the default more conservative?

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 9344216..c1f9968 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -1225,9 +1225,9 @@ static int intel_i915_get_gtt_size(void)
                        break;
                default:
                        dev_info(&agp_bridge->dev->dev,
-                                "unknown page table size 0x%x, assuming 512KB\n",
+                                "unknown page table size 0x%x, assuming 256KB\n",
                                (gmch_ctrl & G33_PGETBL_SIZE_MASK));
-                       size = 512;
+                       size = 256;
                }
        } else {
                /* On previous hardware, the GTT size was just what was
Comment 5 Gordon Jin 2010-06-29 08:50:47 UTC
My impression is the gpu core in Q33/Q35 is exactly the same as G33. Zhenyu could confirm.
Comment 6 Kees Cook 2010-06-29 20:32:04 UTC
The original code path was effectively:

    if (IS_G33) gtt_map_size = 1024 * 1024; /* 1M on G33 */
    else        gtt_map_size = 256 * 1024;

So, if 8086:29b2 is a G33, the new code path it's hitting is:
    if G33_PGETBL_SIZE_1M, size = 1024
    else if G33_PGETBL_SIZE_2M, size = 2048
    else size = 512
otherwise:
    size = agp_bridge->driver->fetch_size();


Changing 512 to 256 would not restore the old behavior since 256 would only have been set for the non-G33 case.  The question should probably be "what does agp_bridge->driver->fetch_size() return?"
Comment 7 Kees Cook 2010-06-29 20:42:41 UTC
Created attachment 26972 [details]
instrumentation of i915 changes

I instrumented my kernel with the attached patch, and it reports:

[    1.025155] agpgart-intel 0000:00:00.0: Intel Q35 Chipset
[    1.025192] i915: G33 detected.
[    1.025194] i915: G33_PGETBL_SIZE_2M detected.
[    1.025196] i915: forcing size to 1024 anyway.
[    1.027041] agpgart-intel 0000:00:00.0: detected 6140K stolen memory
[    1.052314] agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xd0000000
Comment 8 Rafael J. Wysocki 2010-07-01 21:22:19 UTC
First-Bad-Commit : f1befe71fa7a79ab733011b045639d8d809924ad
Comment 9 Rafael J. Wysocki 2010-07-09 20:57:52 UTC
On Friday, July 09, 2010, Kees Cook wrote:
> Hi Rafael,
> 
> On Fri, Jul 09, 2010 at 01:41:39AM +0200, Rafael J. Wysocki wrote:
> > The following bug entry is on the current list of known regressions
> > from 2.6.34.  Please verify if it still should be listed and let the
> tracking team
> > know (either way).
> 
> Yup, this is still a regression.  The fix is trivial (see below).
> 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=16294
> > Subject             : [Q35 bisected] hang at init of i915 driver
> > Submitter   : Kees Cook <kees@outflux.net>
> > Date                : 2010-06-25 18:20 (14 days old)
> 
> diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
> index 9344216..0d9007d 100644
> --- a/drivers/char/agp/intel-gtt.c
> +++ b/drivers/char/agp/intel-gtt.c
> @@ -1229,6 +1229,7 @@ static int intel_i915_get_gtt_size(void)
>                               (gmch_ctrl & G33_PGETBL_SIZE_MASK));
>                       size = 512;
>               }
> +             size = 1024;
>       } else {
>               /* On previous hardware, the GTT size was just what was
>                * required to map the aperture.
Comment 10 Rafael J. Wysocki 2010-07-09 21:49:54 UTC
Handled-By : Tim Gardner <tim.gardner@canonical.com>
Patch : https://patchwork.kernel.org/patch/111130/
Comment 11 Rafael J. Wysocki 2010-07-29 22:02:25 UTC
Fixed by commit e7b96f28c58ca09f15f6c2e8ccbb889a30fab4f7 .
Comment 12 Daniel Vetter 2010-09-08 20:46:01 UTC
The patch that supposedly fixes this bug is unfortunately itself rather broken.
There's a new patch available in the drm-intel-next branch available at:

http://cgit.freedesktop.org/~ickle/drm-intel/

Please retest to ensure that this does not break your systems - I'd like to avoid a regression fix for a regression fix for a regression fix ;)

-Daniel