Bug 86551

Summary: Screen freezes on boot at "fb: switching to inteldrmfb from simple"
Product: Drivers Reporter: Mike Auty (mike.auty)
Component: Video(DRI - Intel)Assignee: Jani Nikula (jani.nikula)
Status: RESOLVED CODE_FIX    
Severity: normal CC: blinxwang, brent.saner, dex, evangelos, imre.deak, intel-gfx-bugs, jani.nikula, leho
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.17, 3.17.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg-ramoops-0
opregion with applied patch
drm/i915: safeguard against too high minimum brightness

Description Mike Auty 2014-10-19 22:59:55 UTC
I have a Samsung 700T tablet with an "Intel Corporation 3rd Gen Core processor Graphcis Controller (rev 09)" according to lspci.  Under <= 3.16.3, it boots from UEFI and displays at 1920x1080 (the native resolution of the display).  After a short moment of starting and some kernel output, the screen flashes black and then reshows itself and the main boot (kernel and post-kernel) continues.

Under >= 3.17, when booting, the screen will freeze at the line "fb: switching to inteldrmfb from simple" and no black flash or change in the screen will happen.

Conducting a git bisection revealed the first bad commit to be 6dda730e55f412a6dfb181cae6784822ba463847 (where the message was "[drm] Replacing VGA console driver").

Both 3.17 and 3.17.1 are affected.  The machine seems unresponsive (and never starts a system logger, so I can't see any further messages, or why it doesn't continue on to the main system).  If I disable modesetting by default it boots fine, but then X won't start.

I'm happy to provide additional information or run tests, but I'm not sure what else to provide at this point.
Comment 1 Imre Deak 2014-10-20 11:52:35 UTC
Could you try to get a log using serial console or netconsole? You can find instructions to set these up in Documentation/serial-console.txt and Documentation/networking/netconsole.txt.
Comment 2 Mike Auty 2014-10-20 20:54:55 UTC
I'm sorry, I have tried, but the tablet has no ethernet port, and no serial port, so I've tried using a USB ethernet adaptor (which the machine detects fine), but I've tried netconsole on a kernel I *can* get to boot, and I'm not getting any packets (netconsole is compiled in to the kernel, and I've tried it both with no parameters other than a target IP, and also giving it a ports, IPs and the interface name, using the stable naming scheme).  As for serial console, I have one USB serial adaptor, and no machines with a native serial port, so no way to see what's coming out of the serial adaptor (if anything).  So I'm afraid I have no way of recovering the data you've asked for.

Are there any other tests I can run, or further information I can provide that might help?  I was hoping that the specific git commit that I mentioned in my initial comment might help narrow down the issue?
Comment 3 Imre Deak 2014-10-21 09:57:15 UTC
(In reply to Mike Auty from comment #2)
> I'm sorry, I have tried, but the tablet has no ethernet port, and no serial
> port, so I've tried using a USB ethernet adaptor (which the machine detects
> fine), but I've tried netconsole on a kernel I *can* get to boot, and I'm
> not getting any packets (netconsole is compiled in to the kernel, and I've
> tried it both with no parameters other than a target IP, and also giving it
> a ports, IPs and the interface name, using the stable naming scheme).  As
> for serial console, I have one USB serial adaptor, and no machines with a
> native serial port, so no way to see what's coming out of the serial adaptor
> (if anything).  So I'm afraid I have no way of recovering the data you've
> asked for.

Ok, I know the pain of getting logs from contemporary systems:) Most (all?) of the USB ethernet adapters won't work since they lack netpolling support which is needed by netconsole. Other possibilities you could check:

- serial port exposed via a USB device port (if your machine has any),
  that you can access via a normal USB cable and the FTDI driver on
  your host

- ramoops. See Documentation/ramoops and
  https://bugs.freedesktop.org/show_bug.cgi?id=76520#c27 to set it up.
  The RAM contents may get reset if you reboot by power off/on, so you may
  need the following kernel options too:
  CONFIG_LOCKUP_DETECTOR
  CONFIG_BOOTPARAM_HARDLOCKUP_PANIC
  CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC
  CONFIG_BOOTPARAM_HUNG_TASK_PANIC
  
  and boot with the kernel parameter panic=5.

> Are there any other tests I can run, or further information I can provide
> that might help?  I was hoping that the specific git commit that I mentioned
> in my initial comment might help narrow down the issue?

That commit is "drm/i915: respect the VBT minimum backlight brightness" and does some backlight level scaling. I can't see how that could lead to a lockup, so could you try its parent commit 1de6068eb several times to be sure the bisect result is correct and it's not just some timing issue?

It's just a wild guess, but could you try if the patch at
https://lkml.org/lkml/2014/10/2/254 helps?

Also please give a try to the latest drm-intel-nightly kernel:
git://anongit.freedesktop.org/drm-intel drm-intel-nightly branch.
Comment 4 Mike Auty 2014-10-21 13:11:02 UTC
Created attachment 154401 [details]
dmesg-ramoops-0

Success!!!

The ramoops worked, I got a console-ramoops, dmesg-ramoops-0.enc.z, and dmesg-ramoops-1.enc.z (after unpacking by removing the pstore time and "no errors detected" lines, the -0 and -1 seem identical and contain the data of console-ramoops, so only attaching -0).

From the best I can tell, it's a divide by 0 in the scale function (which is new in the git commit I indicated), but I haven't figured out the exact source or path causing it.

Please let me know if you'd like me to run any of the other tests, or have any patches you'd like me to try?
Comment 5 Imre Deak 2014-10-22 10:34:38 UTC
(In reply to Mike Auty from comment #4)
> Created attachment 154401 [details]
> dmesg-ramoops-0
> 
> Success!!!
> 
> The ramoops worked, I got a console-ramoops, dmesg-ramoops-0.enc.z, and
> dmesg-ramoops-1.enc.z (after unpacking by removing the pstore time and "no
> errors detected" lines, the -0 and -1 seem identical and contain the data of
> console-ramoops, so only attaching -0).
> 
> From the best I can tell, it's a divide by 0 in the scale function (which is
> new in the git commit I indicated), but I haven't figured out the exact
> source or path causing it.
> 
> Please let me know if you'd like me to run any of the other tests, or have
> any patches you'd like me to try?

Thanks, it was helpful and your bisect result was correct. CC'ing Jani. I found a similar report:

https://lkml.org/lkml/2014/10/3/415
Comment 6 Jani Nikula 2014-10-22 12:16:36 UTC
From the lkml report:

[drm:parse_lfp_backlight] VBT backlight PWM modulation frequency 210 Hz, active high, min brightness 255, level 255

we stumble over because of an apparent confusion over what VBT means by min brightness. Asking the BIOS team.

Something like this should work around the issue:

diff --git a/drivers/gpu/drm/i915/intel_panel.c b/drivers/gpu/drm/i915/intel_panel.c
index e18b3f49074c..33d14dcc1019 100644
--- a/drivers/gpu/drm/i915/intel_panel.c
+++ b/drivers/gpu/drm/i915/intel_panel.c
@@ -1098,7 +1098,7 @@ static u32 get_backlight_min_vbt(struct intel_connector *connector)
 	WARN_ON(panel->backlight.max == 0);
 
 	/* vbt value is a coefficient in range [0..255] */
-	return scale(dev_priv->vbt.backlight.min_brightness, 0, 255,
+	return scale(dev_priv->vbt.backlight.min_brightness, 0, 512,
 		     0, panel->backlight.max);
 }
Comment 7 Mike Auty 2014-10-22 19:36:25 UTC
Just to confirm that the patch does allow the display to boot past the point it was freezing and complete the boot.  Xorg doesn't seem to start properly, but that'll be a different issue...
Comment 8 Jani Nikula 2014-10-23 16:22:25 UTC
*** Bug 85171 has been marked as a duplicate of this bug. ***
Comment 9 Daniel Exner 2014-10-23 19:38:33 UTC
I tried the proposed patch (change min_brightness from 255 to 512) and I can confirm that it indeed fixed my issue.

Xorg started fine, so I guess this was the only issue :)
Comment 10 Jani Nikula 2014-10-29 14:02:46 UTC
Please post /sys/kernel/debug/dri/0/i915_opregion.
Comment 11 Daniel Exner 2014-10-29 17:07:13 UTC
Created attachment 155811 [details]
opregion with applied patch

I attached /sys/kernel/debug/dri/0/i915_opregion with applied patch. Should I do the same with vanilla kernel?
Comment 12 Jani Nikula 2014-10-30 08:50:08 UTC
(In reply to Daniel Exner from comment #11)
> I attached /sys/kernel/debug/dri/0/i915_opregion with applied patch. Should
> I do the same with vanilla kernel?

No need, thanks.
Comment 13 Jani Nikula 2014-11-05 12:47:39 UTC
Created attachment 156641 [details]
drm/i915: safeguard against too high minimum brightness

Please test this patch aimed for upstream inclusion. Thanks.
Comment 14 Mike Auty 2014-11-07 03:01:42 UTC
I've now tested this patch and can confirm that it fixes the problem for me.  Thanks!
Comment 15 Jani Nikula 2014-11-07 07:45:01 UTC
Fixed in drm-intel-fixes by

commit e1c412e75754ab7b7002f3e18a2652d999c40d4b
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Wed Nov 5 14:46:31 2014 +0200

    drm/i915: safeguard against too high minimum brightness

which will eventually find itself in v3.18 and v3.17 stable kernels.

Thanks for the report and testing.