Bug 88301 - [PNV] Kernel 3.16 boot hang
Summary: [PNV] Kernel 3.16 boot hang
Status: CLOSED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: intel-gfx-bugs@lists.freedesktop.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-16 11:33 UTC by mfld.fr
Modified: 2015-05-07 17:43 UTC (History)
7 users (show)

See Also:
Kernel Version: >= 3.15-rc1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description mfld.fr 2014-11-16 11:33:36 UTC
After switching from kernel 3.14.14 to 3.16.5, the i915 driver does not work any more and hangs the kernel boot with a blank & black screen.

Hardware:
  Asus eeePC 1001PX
  Platform NM10 / ICH7
  Graphic chip GMA3150

Problem insulation:
  Remove i915 from kernel .config and rebuild: no more kernel boot hang

Workarounds:
  1- Keep i915 out and replace by generic VESA driver (degraded resolution)
  2- Keep same kernel configuration and downgrade to 3.14 line
Comment 1 Daniel Vetter 2014-11-18 10:11:03 UTC
Since this is a regression can you pleas try to bisect the offending commit?
Comment 2 mfld.fr 2014-11-18 19:57:17 UTC
Unfortunately I am using the Gentoo distibution that provides only 3.14.14 or 3.16.5 tainted versions of the kernel in the streamline (amd64).

So let me try some intermediate and not-tainted versions from kernel.org... do you have any suggestion of some version numbers that would be the best as first to try for the i915 driver ?
Comment 3 Daniel Vetter 2014-11-19 09:35:11 UTC
Generally the versions should match well, as long as you use your distro's kernel configuration. But of course before you start the bisect you have to confirm that the git versions have the same issue.
Comment 4 mfld.fr 2015-01-29 19:50:33 UTC
Tested on genuine kernel 3.16.1 (from kernel.org):
No problem with i915 driver.
Continuing...
Comment 5 mfld.fr 2015-01-30 12:33:40 UTC
Tested on other genuine kernels (from kernel.org):
- 3.16.2: Blank screen at boot: FAIL
- 3.16.1: Correct video mode switch at boot: SUCCESS

So the regression occured between 3.16.1 and 3.16.2.
Changed Kernel Version in bug header from 3.16.5 to 3.16.2.

Using the same .config (except the line that contains the 3.16.x version string).

Waiting for instructions if you want more testing on my side...
Comment 6 mfld.fr 2015-01-30 12:35:38 UTC
Also tested on genuine kernel 3.16.3 (from kernel.org):
Same problem as with 3.16.2.
Comment 7 Jani Nikula 2015-02-11 14:26:57 UTC
There were no i915 related changes in drivers/gpu or drivers/acpi between v3.16.1 and v3.16.2. if .1 is good and .2 is bad, it should be fast to bisect to the offending commit. Please do that.
Comment 8 mfld.fr 2015-02-16 17:54:48 UTC
OK to bisect, but I have an issue while cloning the 3.6 branch before doing so:

mfldhost # git clone -n -b linux-3.16.y git://git.kernel.org/linux/kernel/git/stable/linux-stable
Clonage dans 'linux-stable'...
fatal: remote error: access denied or repository not exported: /linux/kernel/git/stable/linux-stable

Could you please tell me what I am doing wrong here ? Thanks !
Comment 9 Jani Nikula 2015-02-16 18:17:59 UTC
I'm using the repo at
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git


PS. I generally like to use a clone of Linus' kernel tree, and then add other trees as remotes, e.g.

$ cd linux
$ git remote add stable git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
$ git fetch stable
$ git bisect good v3.16.1
$ git bisect bad v3.16.2

etc.

YMMV, obviously.
Comment 10 mfld.fr 2015-02-27 18:40:31 UTC
Thanks for the exact GIT repository URL, it works better on my side now.

Unfortunately, I realized that I made a mistake while testing the archived kernel versions (forgot to cleanup the build directory before compiling...), so my previous report on 3.16.1 and 3.16.2 is not correct.

I restarted by bisecting with GIT between v3.14.31 (good) and v3.16.1 (actually bad), will tell you the result... stay tuned !
Comment 11 mfld.fr 2015-02-28 15:51:07 UTC
Result from GIT bisect: the first bad commit is: d978ef14456a38034f6c0e94a794129501f89200

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=d978ef14456a38034f6c0e94a794129501f89200

Now linked to i915.
Comment 12 Jani Nikula 2015-03-02 12:59:58 UTC
That's likely fixed by

commit edd586fe705e819bc711b5ed7194a0b6f9f1a7e1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Apr 23 08:54:31 2014 +0100

    drm/i915: Discard BIOS framebuffers too small to accommodate chosen mode

in v3.15-rc4. So you need to try v3.15-rc4 or later.
Comment 13 mfld.fr 2015-03-02 18:11:55 UTC
I am sorry, but I already tested 3.16.1 / .2 / .3 / .5 (see above) and the problem occurs with next releases.

I also just checked with GIT v3.15-rc4, and the boot still hangs with the black screen at the time of the video mode switching from VGA to native resolution.

Any suggestion ?
Comment 14 mfld.fr 2015-03-02 18:18:31 UTC
Updated kernel version field with >= 3.15-rc1.
Comment 15 Jani Nikula 2015-03-03 13:07:14 UTC
My point was that your bisect probably pointed at the wrong commit.
Comment 16 mfld.fr 2015-03-04 18:37:31 UTC
I just tested the parent of the commit d978ef14456a38034f6c0e94a794129501f89200, and I got a failed boot. So you are right, my bisect result is bad... I wonder what I made wrong...

I also tested v3.15-rc1 (that contains the xxx9200), and it failed at boot.
Comment 17 Jani Nikula 2015-03-05 08:22:56 UTC
The problem was that you hit another bug during bisect. If you checked the parent of d978ef14456a38034f6c0e94a794129501f89200 is bad, you can flag that as bad with git bisect and you should be good to continue with the hunt for the original bug.
Comment 18 mfld.fr 2015-03-17 17:47:27 UTC
Hello, I restarted the bisect from the latest known tags on the master
branch, and checked this time the path taken by the bisect to avoid the
previous miss.

c9eaa44  bad   3.15-rc1
455c6fd  good  3.14

Here is the bisect path I followed:

Using Git bisect and check that it stays on the master branch:

cd6362b  good
d2b150d  good
5fb6b95  good  last

Here Git escaped from the master branch, so back to manual bisect:

79d2d21  bad
4162877  bad
75ff24f  bad
e9f37d3  bad   first

Commit e9f37d3 is just after 5fb6b95, and is:
Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
the merge of branch drm-next into master.

Branch 'drm-next' based on cfbf8d4 (tag 3.14-rc4) and merged from c39b069.

Manual bisect on the 'drm-next' branch:

cfbf8d4  good
c39b069  bad
8fa6a9e  good
d9961b2  bad
bfe8b57  good
66e514c  bad   first
2844ea3  good  last

Commit 66e514c is just after 2844ea3 and is:
Merge tag 'drm-intel-next-2014-03-21' of
git://anongit.freedesktop.org/drm-intel into drm-next

Branch 'drm-intel-next-queued' based on 319e2e3 (tag 3.13-rc4) and merged
from 698b313.

Manual bisect on the 'drm-intel-next-queued' branch:

319e2e3  good
698b313  bad
a0bae57  good
6a68735  good
6c7fba0  bad
6d129be  good
46f297f  good
ff2652e  bad
d978ef1  bad   first
4c6baa5  good  last

Commit d978ef1 is just after 4c6baa5 and is:
drm/i915: Wrap the preallocated BIOS framebuffer
and preserve for KMS fbcon v12

Same result as before... so double checked again:

4c6baa5  now bad !!!

Looks like there is another condition that triggers the failure...
Let us assume it is the cold or warm reboot after building the kernel,
and that warm reboot gives a false good for previous tests.

So now always testing from power off:

46f297f  good
1ad292b  good  last
4c6baa5  bad   first

Commit 4c6baa5 is just after 1ad292b and is:
drm/i915: get_plane_config support for ILK+ v3

I think this time we have a good candidate.
Comment 19 mfld.fr 2015-05-04 09:41:42 UTC
Hello, any update from owner side ?
Comment 20 Jani Nikula 2015-05-04 10:09:37 UTC
(In reply to mfld.fr from comment #19)
> Hello, any update from owner side ?

So this is the bad commit?

commit 4c6baa595f4e8516bb9cf0081765f90856aa2fe3
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Fri Mar 7 08:57:50 2014 -0800

    drm/i915: get_plane_config support for ILK+ v3

Jesse, any ideas except to try the latest kernels?
Comment 21 mfld.fr 2015-05-04 10:14:13 UTC
This time I checked and rechecked from power off state : this is definitively the offending commit that hangs the kernel boot with a blank screen on my hardware (see first comment for HW desc).
Comment 22 Jesse Barnes 2015-05-04 15:22:35 UTC
Hm, that shouldn't have affected PNV at all...  does the problem go away if you comment out the lines it added in intel_init_display()?

@@ -10875,6 +10935,7 @@ static void intel_init_display(struct drm_device *dev)
 
        if (HAS_DDI(dev)) {
                dev_priv->display.get_pipe_config = haswell_get_pipe_config;
+               dev_priv->display.get_plane_config = ironlake_get_plane_config;
                dev_priv->display.crtc_mode_set = haswell_crtc_mode_set;
                dev_priv->display.crtc_enable = haswell_crtc_enable;
                dev_priv->display.crtc_disable = haswell_crtc_disable;
@@ -10882,6 +10943,7 @@ static void intel_init_display(struct drm_device *dev)
                dev_priv->display.update_plane = ironlake_update_plane;
        } else if (HAS_PCH_SPLIT(dev)) {
                dev_priv->display.get_pipe_config = ironlake_get_pipe_config;
+               dev_priv->display.get_plane_config = ironlake_get_plane_config;
                dev_priv->display.crtc_mode_set = ironlake_crtc_mode_set;
                dev_priv->display.crtc_enable = ironlake_crtc_enable;
                dev_priv->display.crtc_disable = ironlake_crtc_disable;

I don't see why it would...
Comment 23 mfld.fr 2015-05-07 17:42:03 UTC
Hello all,

Even reproducible, nobody here seems to have any idea of what this problem could come from, and we all spent too much time on a problem that occurs on an rather old and specific HW (more than 7 years today). The not-too-bad solution is to stay on the 3.14 branch that works very well, so I close this case. Thanks to everyone who had a look on this.

Note You need to log in before you can comment on or make changes to this bug.