Bug 16891 - Kernel panic while loading intel module during boot
Summary: Kernel panic while loading intel module during boot
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Daniel Vetter
URL:
Keywords:
Depends on:
Blocks: 16055
  Show dependency tree
 
Reported: 2010-08-24 13:19 UTC by Anisse Astier
Modified: 2011-01-23 16:12 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.35.4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Dmesg output of the crash (35.30 KB, text/plain)
2010-08-24 13:19 UTC, Anisse Astier
Details
lspci -vvvnn of the machine from a working 2.6.34.1 kernel (21.85 KB, text/plain)
2010-08-26 15:41 UTC, Anisse Astier
Details
Fix for intel-gtt 2.6.35 regression (4.33 KB, patch)
2010-10-05 10:10 UTC, Anisse Astier
Details | Diff

Description Anisse Astier 2010-08-24 13:19:01 UTC
Created attachment 27791 [details]
Dmesg output of the crash

The kernel will panic during boot while loading intel module (in initramfs)

This is an Atom-based machine: MSI AE1920 with video card 8086:a001.

It was working with kernel 2.6.34.1; I might bisect when I have some time.

Dmesg output is attached. I'm getting it with netconsole.
Comment 1 Anisse Astier 2010-08-26 15:38:28 UTC
Some suggested on #intel-gfx that the problem might be that agp was modular. After setting CONFIG_AGP=y and CONFIG_AGP_INTEL=y, the problem was still happening, except earlier in boot (obviously).

I bisected the problem to this commit:


commit e7b96f28c58ca09f15f6c2e8ccbb889a30fab4f7
Author: Tim Gardner <tim.gardner@canonical.com>
Date:   Fri Jul 9 14:48:50 2010 -0600

    agp/intel: Use the correct mask to detect i830 aperture size.
    
    BugLink: https://bugs.launchpad.net/bugs/597075
    
    commit f1befe71fa7a79ab733011b045639d8d809924ad introduced a
    regression when detecting aperture size of some i915 adapters, e.g.,
    those on the Intel Q35 chipset.
    
    The original report: https://bugzilla.kernel.org/show_bug.cgi?id=15733
    The regression report: https://bugzilla.kernel.org/show_bug.cgi?id=16294
    
    According to the specification found at
    http://intellinuxgraphics.org/VOL_1_graphics_core.pdf, the PCI config
    space register I830_GMCH_CTRL is a mirror of GMCH Graphics
    Control. The correct macro for isolating the aperture size bits is
    therefore I830_GMCH_GMS_MASK along with the attendant changes to the
    case statement.
    
    Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
    Tested-by: Kees Cook <kees.cook@canonical.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Eric Anholt <eric@anholt.net>
    Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Eric Anholt <eric@anholt.net>
Comment 2 Anisse Astier 2010-08-26 15:41:31 UTC
Created attachment 28041 [details]
lspci -vvvnn of the machine from a working 2.6.34.1 kernel
Comment 3 Anisse Astier 2010-08-30 09:05:44 UTC
The bug is still reproducible with 2.6.35.4.
Reverting the offending commit allows the boot to reach completion (no hang), but then I have other problems with input devices in Xorg and Ethernet networking. It's the same if I revert f1befe71fa7a79ab733011b045639d8d809924ad, which introduced the previous regression. I don't know if the problems are related.
Comment 4 Daniel Vetter 2010-09-02 15:00:00 UTC
Can you try my intel-gtt rework branch?

http://cgit.freedesktop.org/~danvet/drm/log/?h=intel_gtt_rework

It contains a patch that should fix problems due to e7b96f28c58ca09f15f6c.
Comment 5 Anisse Astier 2010-09-02 15:03:15 UTC
I can, and will, but it might take a few days.
Thanks
Comment 6 Anisse Astier 2010-09-08 09:30:20 UTC
(In reply to comment #3)
> but then I have other problems with input devices in Xorg and Ethernet
> networking. 
These problems where in userland on my side.




Your branch fixes this specific bug, and boot is able to reach completion.

Tested-by: Anisse Astier <anisse@astier.eu>
Comment 7 Chris Wilson 2010-09-11 09:23:38 UTC
As it stands the code is in drm-intel-next, but I think Daniel is planning to submit a stable patch as well.
Comment 8 Andrew Morton 2010-09-24 22:38:34 UTC
(In reply to comment #7)
> As it stands the code is in drm-intel-next, but I think Daniel is planning to
> submit a stable patch as well.

Did he?
Comment 9 Anisse Astier 2010-09-29 09:07:23 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > As it stands the code is in drm-intel-next, but I think Daniel is planning
> to
> > submit a stable patch as well.
> 
> Did he?

AFAIK, not yet.
Comment 10 Daniel Vetter 2010-09-30 15:39:34 UTC
> --- Comment #9 from Anisse Astier <anisse@astier.eu>  2010-09-29 09:07:23 ---
> (In reply to comment #8)
> > (In reply to comment #7)
> > > As it stands the code is in drm-intel-next, but I think Daniel is
> planning to
> > > submit a stable patch as well.
> > 
> > Did he?
> 
> AFAIK, not yet.

[Sorry for the late reply, looks like bz.k.org fail has eaten my response.]

Nope, not yet. I'd like to give this some vetting time in -linus. It's                                                     
marked cc: stable, so Greg KH will remind me in time to backport. I'd                                                      
simply like to avoid yet another regression in -stable over the same                                                       
problem - if I'm counting correctly, my patches a trial number tree :(                                                     
Current solutions seems to be the right one, but that's also what the                                                      
changelog of the previous two patches claimed.

-Daniel
Comment 11 Rafael J. Wysocki 2010-10-04 19:42:30 UTC
On Monday, October 04, 2010, Anisse Astier wrote:
> On Sun,  3 Oct 2010 23:53:02 +0200 (CEST), "Rafael J. Wysocki" <rjw@sisk.pl>
> wrote :
> 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.34 and 2.6.35.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.34 and 2.6.35.  Please verify if it still should
> > be listed and let the tracking team know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=16891
> > Subject             : Kernel panic while loading intel module during boot
> > Submitter   : Anisse Astier <anisse@astier.eu>
> > Date                : 2010-08-24 13:19 (41 days old)
> > 
> > 
> 
> This bug is still valid, and should be listed as a regression.
> I tried to upload on bugzilla a patch authored by Daniel Vetter that fixes
> the problem, but then bugzilla went into blackhole mode.
Comment 12 Anisse Astier 2010-10-05 10:10:25 UTC
Created attachment 32602 [details]
Fix for intel-gtt 2.6.35 regression

Patch fixing the problem.

It still needs a meaningful description and Daniel's Signed-off-by.
Comment 13 Tim Gardner 2010-10-11 07:17:43 UTC
The original respondent (Kees Cook) has built a kernel with this patch. He reports no regressions.
Comment 14 Rafael J. Wysocki 2010-10-11 20:25:12 UTC
Patch : https://bugzilla.kernel.org/attachment.cgi?id=32602
Handled-By : Anisse Astier <anisse@astier.eu>
Comment 15 Florian Mickler 2011-01-23 16:12:31 UTC
fixed in .37-rc1 by 
commit e5e408fc94595aab897f613b6f4e2f5b36870a6f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sat Aug 28 11:04:32 2010 +0200

    intel-gtt: fix gtt_total_entries detection

Note You need to log in before you can comment on or make changes to this bug.