Bug 95741

Summary: Broadwell graphics on Dell XPS13 (early 2015) loses horizontal sync on shift to tty console
Product: Drivers Reporter: W Unruh (unruh)
Component: Video(DRI - Intel)Assignee: drivers_video-other
Status: RESOLVED CODE_FIX    
Severity: normal CC: adrian, ben, brandon, bugzilla, intel-gfx-bugs, james, michael, nils.alex, soda, soren121, superm1, tasev.stefanoska
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: 4.0.0 RC4/5 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg with drm.debug=14
Another user's partial dmesg output on switch on 4.0 r1
kernel bisect log
Dmesg with drm.debug=14 good kernel
Dmesg with drm.debug=14 bad kernel
dmesg kernel 4.1-rc1 drm.debug=14

Description W Unruh 2015-03-27 17:43:40 UTC
Dell XPS13 with Broadwell-U graphics
00:00.0 Host bridge: Intel Corporation Broadwell-U Host Bridge -OPI (rev 09)
00:02.0 VGA compatible controller: Intel Corporation Broadwell-U Integrated Graphics (rev 09)
looses horizontal and vertical sync when one changes from X to a tty terminal (alt-ctrl-F2 say) It worked fine on 3.19.2 kernel, and it works fine in the initial boot sequence (without the spash quiet kernel parameters so one sees the stuff scroll by on tty1 and the terminal works well there) but when the system tries to bring up the terminal console, the text scrolls rapidly across the screen, and slowly drifts vertically as well. Somehow the graphics modes are getting messed up in 4.0.0 which was not a problem in 3.19.2
Also when the system shutdown or reboot, the termianl screen again shows on the shutdown, and again the hor sync is messed up.
This is on the laptop's HD screen.


(Sorry, I do not know what the right Component is so I chose Video(other))
Comment 1 W Unruh 2015-03-29 18:00:45 UTC
Another data point: I switched to the tty2 and watched for about 2 min tosee what would happen to the horisontal messed up display. After about 2 min, the screen went blank. Attempts to go back to the X display failed-- I would see a 0.1 sec flash of the X display, and then it would go black. The same was true if I did alt-ctrl-bksp, to kill and restart X. I would get a flash of the kdm logon screen and then black. Ie too long on that tty would permanantly mess up the graphics. Something is very wrong in the driver.(i915?)
Comment 2 W Unruh 2015-03-30 06:50:18 UTC
This is almost certainly an Intel bug, so am reassigning it to the intel video.
Comment 3 Jani Nikula 2015-04-01 12:01:34 UTC
dmesg with drm.debug=14 module parameter set all the way from boot might be useful.

Since it's a regression, can you do the bisect?
Comment 4 W Unruh 2015-04-02 01:07:49 UTC
I am afraid that I am not knowledgeable enough to do a bisect. I do know that the terminal change works under 3.19.2 (from Mageia 5) but does not work with kernel 4.0 rc 4 5 or 6 using Hans van Schoot's  .config file. 

wget http://forthescience.org/blog/wp-content/uploads/2015/03/linux-kernel_4.0rc4-config-ubu1404-xps13 -O .config

There are indications on the major.io site that already in rc2 it did not work. 

I will try to run with the drm.debug parameter when I get back to the machine.
I presume I just need to pass that as a grub boot parameter to the kernel.
Comment 5 W Unruh 2015-04-02 06:06:15 UTC
Created attachment 172981 [details]
dmesg with drm.debug=14

Here is the dmesg output with the Kernel 4.0 rc6 (with the _REV=2 and the Alsa patch for hda headset output). I booted up with the drm.debug=14 on the command line, logged in using the kdm to X, and then once doing alt-ctrl-F2 to go to the tty (with lost hor sync) back to X with Alt-ctrl-F1 then to alt-ctrl-F5 again lost sync, and back to X.
As mentioned This works properly on 3.19.2 (no lost sync) but fails on 4.0 rc 4 5 and 6 with indications that it also fails (loses sync) on rc2, but I have not personally tested that.
Comment 6 Jani Nikula 2015-04-02 07:50:20 UTC
(In reply to W Unruh from comment #4)
> I am afraid that I am not knowledgeable enough to do a bisect.

Just saying, if you know how to configure and build kernels, you're almost there!

Basically git clone Linus' kernel tree, and do 'git bisect start v3.19 v4.0-rc6', and git will keep giving you kernel versions to try, and you tell it 'git bisect good' or 'git bisect bad' in return. See e.g. https://wiki.ubuntu.com/Kernel/KernelBisection
Comment 7 W Unruh 2015-04-02 14:36:17 UTC
Created attachment 173021 [details]
Another user's partial dmesg output on switch on 4.0 r1

I asked on the major.io page and another user said he saw the same problem all the way back to 3.18.2, but on the kernels before 4.0 it was sporadic. Another user also saw it sproadically on 3.19. On 4.0, various rc versions, he always sees it. He gets the included output in the dmesg
output. 
I have not seen it in the 3.19.{0,1,2} kernels, using the Mageia 5 .config file and patches, but I must admit I did not try the switching that often with the 3.19.2 kernel, so might not have caught a sporadic problem.
Comment 8 W Unruh 2015-04-03 16:05:09 UTC
Unfortunately. I do not think I am going to be able to the kernel bisect. I have now tried 5 times and each time failed to get even the first bisect. The first time through my stupidity, the second, third because I did not know how .config worked ( eg commenting out a line does not remove it when you run make oldconfig) the fourth because the mkspec script was broken since I have an rpm based system, and the cp command got an invalid option stuck in. The fifth time I tried to use deb-pkg, but of course after the compilation the system discovered that it had did not have the right programs to make the .deb file. Each compilation takes over an hour. 
Also how in the world do you switch off DEBUG_KERNEL? My compilations are 10GB in size and make a /lib/modules or 2GB in size, which is absurd.

I hope that the dmesg dumps I have supplied are sufficient.
Comment 9 W Unruh 2015-04-03 17:19:17 UTC
So I tried again, to use the make menuconfig being careful to erase the .config file first, and it now grabbed bad entries from .config.old 
This kernel compiling seems designed to trap the unwarry into wasting huge amounts of time.
Comment 10 nils.alex 2015-04-03 18:58:33 UTC
Hi, I experience the same problem. I did a bisect today:

git bisect start '--' 'drivers/gpu/drm/i915'
# good: [19583ca584d6f574384e17fe7613dfaeadcdc4a6] Linux 3.16
git bisect good 19583ca584d6f574384e17fe7613dfaeadcdc4a6
# bad: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
git bisect bad bfa76d49576599a4b9f9b7a71f23d73d6dcff735
# bad: [a12624959ad4e3bfa8c344ad71728ffc9a379158] drm/i915: Update DRIVER_DATE to 20140905
git bisect bad a12624959ad4e3bfa8c344ad71728ffc9a379158
# bad: [6d93c0c41760c018ca02bf4c9164b9fda2184670] drm/i915: fix VDD state tracking after system resume
git bisect bad 6d93c0c41760c018ca02bf4c9164b9fda2184670
# good: [ea0c76f8c306716a301abbf28699c4ca0a102bed] drm/i915: Emphasize that ctx->obj & ctx->is_initialized refer to the legacy rcs ctx
git bisect good ea0c76f8c306716a301abbf28699c4ca0a102bed
# good: [eeefa889cddb8d7e4ee6ce0212e685dd624d66a1] drm/i915: Remove redundant HAS_PSR checks
git bisect good eeefa889cddb8d7e4ee6ce0212e685dd624d66a1
# good: [b98856a86b0a44ef9d0ab61f6855baf6d941fe6f] drm/i915: Replace HAS_PCH_SPLIT which incorrectly lets some platforms in
git bisect good b98856a86b0a44ef9d0ab61f6855baf6d941fe6f
# good: [11bed958b72e15fd12d78c30ce49047b94817840] drm/i915: mst topology dumper in debugfs (v0.2)
git bisect good 11bed958b72e15fd12d78c30ce49047b94817840
# good: [4dac3edfe68e5e1b3c2216b84ba160572420fa40] Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next
git bisect good 4dac3edfe68e5e1b3c2216b84ba160572420fa40
# good: [72b79c9b2f1b23ba63d7e215963fb90475286ff1] drm/i915: Update DRIVER_DATE to 20140725
git bisect good 72b79c9b2f1b23ba63d7e215963fb90475286ff1
# good: [1381308bb1e24fd7906eab3f046654041546cce3] drm/i915: wait for all DSI FIFOs to be empty
git bisect good 1381308bb1e24fd7906eab3f046654041546cce3
# bad: [f573de5a8474ae2cfc28424423c5a68c780a904b] drm/i915: Add correct hw/sw config check for DSI encoder
git bisect bad f573de5a8474ae2cfc28424423c5a68c780a904b
# bad: [aba86890a1785d787bfe7a741f910a472280540a] drm/i915: factor out intel_edp_panel_vdd_sanitize
git bisect bad aba86890a1785d787bfe7a741f910a472280540a
# first bad commit: [aba86890a1785d787bfe7a741f910a472280540a] drm/i915: factor out intel_edp_panel_vdd_sanitize

At first I believed that I have found the bad commit, BUT then, unfortunately, the problem occured in the latest working commit (according to the above bisect) as well. It is just occuring rarely around these commits. I yet have to find a working version. Next I tried 3.14, which again behaves very badly (just like 4.0).

Note that during the bisect I started to encouter a different problem: The screen startet to flicker horizontally, sometimes more, sometimes less, which made it hard to decide whether this commit is actually good or bad when trying to resolve the problem at hand.
Comment 11 W Unruh 2015-04-03 23:28:39 UTC
Very strange. I am doing a bisect now (I finally got it working, paring down the modules etc so it now only takes 1/2 hour per compile. (there must be some more efficient way to do it. ) I do not have any trouble with 3.19.0 so I am narrowing it down to somewhere between that and 4.0 rc4. I will post when I finish (I have about another 6 bisections to go.) I sure wish there were some way to run the bisections so that only those modules/ files that had changed got recompiled. But then I am doing the rpm-pkg, and it of course saves nothing. 

So far I have (BISECT_LOG) 
git bisect start
# good: [bfa76d49576599a4b9f9b7a71f23d73d6dcff735] Linux 3.19
git bisect good bfa76d49576599a4b9f9b7a71f23d73d6dcff735
# bad: [13a7a6ac0a11197edcd0f756a035f472b42cdf8b] Linux 4.0-rc2
git bisect bad 13a7a6ac0a11197edcd0f756a035f472b42cdf8b
# good: [22aa66a3ee5b61e0f4a0bfeabcaa567861109ec3] dm snapshot: fix a possible invalid memory access on unload
git bisect good 22aa66a3ee5b61e0f4a0bfeabcaa567861109ec3
# bad: [796e1c55717e9a6ff5c81b12289ffa1ffd919b6f] Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
git bisect bad 796e1c55717e9a6ff5c81b12289ffa1ffd919b6f
# good: [9682ec9692e5ac11c6caebd079324e727b19e7ce] Merge tag 'driver-core-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect good 9682ec9692e5ac11c6caebd079324e727b19e7ce
# good: [a9724125ad014decf008d782e60447c811391326] Merge tag 'tty-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good a9724125ad014decf008d782e60447c811391326
# bad: [f43dff0ee00a259f524ce17ba4f8030553c66590] Merge tag 'drm-amdkfd-next-fixes-2015-01-25' of git://people.freedesktop.org/~gabbayo/linux into drm-next
git bisect bad f43dff0ee00a259f524ce17ba4f8030553c66590


Note that even with the later "good ones" ones, when I run "shut down computer" 
the brief vision of the text screen has a lost hsync  sometimes. This seems to be sporadic.
When I test, I do alt-ctrl-Fx and then alt-ctrl-F1 to get back to X a number of times (say 4)  and the result is always consistent (ie either lost hsync in the tty, or it is fine). I do not see any horizontal flicker (I assume that means in X, but I also do not see that in the tty console either).

Strange.
Comment 12 W Unruh 2015-04-04 01:07:24 UTC
The bisect has gone crazy. I set 3.19 and 4.0 rc4 as the boundaries. The last three have gone backward and the last one was 3.18 rc 7 Ie, it has gone outside the boundaries of the two boundaries I set. 
But it does seem that there was a brief and shining hour around 3.19 (I am running 3.19.2 mga 5) when this hsync problem did not happen. and if one goes earlier or much later, problems arise. I am not sure what the git algorithm is for this bisection, but it does appear to have an instability in it.

This may also be the problem that nils had.
Comment 13 W Unruh 2015-04-04 23:32:13 UTC
Created attachment 173151 [details]
kernel bisect log

Here is the kernel bisect log. 

OK, sorry about that rant. It is always uncomfortable when one's illusion of how the world work meets with reality. In this case my linear model of how kernel development works with the actuallity of multiple threads in kernel development, not all of which come together at any one release. Ie, threads which  
Anyway, I finally finished my bisection-- I should probably have told it to concentrate on drm/i915, since that is where it finished up. 

# first bad commit: [cf4c7c12258ed9367f4fc45238f5f50d2db892c1] drm/i915: Make all plane disables use 'update_plane' (v5)
 
(This is not what Nils got. There must have been a similar problem on an earlier kernel sequence, which got fixed and then broken again.)

Note that while for the Mageia 3.19.2 it (sometimes) loses hsync on shutdown (although the tty consoles work) on the bisections, at least toward the end, the hsync was good on the good kernels, and bad on the bad kernels. 
Now I did not test any of the kernels in the bisection extensively. (about 3 or four tty selections after X login, and one or two at the KDM screen) and only shutdown once each time, so there may be sporadic problems there still.
Comment 14 W Unruh 2015-04-10 16:53:04 UTC
Is any more information needed to track this down? I am losing the XPS 13 in a couple of days, so cannot carry out further tests
Comment 15 Tasev Nikola 2015-05-04 18:46:18 UTC
Hi,

I have an Asus UX305fa and I'm experiencing the same problem. 
00:02.0 VGA compatible controller: Intel Corporation Broadwell-U Integrated Graphics (rev 08)
After bisecting, I found the same bad commit as W Unruh: 

root@laura:~/Bureau/documents/linux-git# git bisect good 
cf4c7c12258ed9367f4fc45238f5f50d2db892c1 is the first bad commit
commit cf4c7c12258ed9367f4fc45238f5f50d2db892c1
Author: Matt Roper <matthew.d.roper@intel.com>
Date:   Thu Dec 4 10:27:42 2014 -0800  

But I don't know how to revert this commit to test if it's the good one. 
The 3.19 kernel works fine except the hor sync is messed up when the computer is shut down (the final splash screen).  

The 4.0-rc1 doesn't work for switch to tty1,2,3,4,5 (with ctrl-alt-f1).  

The dmesg with drm.debug = 14 is attached for the good and bad kernels.

If more info is needed, just tell me

Tasev
Comment 16 Tasev Nikola 2015-05-04 18:48:30 UTC
Created attachment 175811 [details]
Dmesg with drm.debug=14 good kernel
Comment 17 Tasev Nikola 2015-05-04 18:49:27 UTC
Created attachment 175821 [details]
Dmesg with drm.debug=14 bad kernel
Comment 18 Tasev Nikola 2015-05-06 18:38:55 UTC
Hi,

I just tried the latest 4.1-rc2 kernel and the bug is still there.
I tried with the latest drivers from the oibaf ppa with the same result.
I also tried with various kernel parameters like i915_enable_rc6=6, i915.lvds_downclock=1 with no luck.
Attached is dmesg after i try to switch to console with ctrl-alt-F1 and back
to graphical session with ctrl-alt-F7.
Comment 19 Tasev Nikola 2015-05-06 18:40:39 UTC
Created attachment 176061 [details]
dmesg kernel 4.1-rc1 drm.debug=14
Comment 20 Tasev Nikola 2015-05-12 09:47:26 UTC
Hi, 

I tried the 4.1-rc3 now, the problem is still there.

Is there anything else that i could do (as average user)to help debug this?

Thank's
Comment 21 Jani Nikula 2015-05-12 10:30:42 UTC
(In reply to Tasev Nikola from comment #19)
> Created attachment 176061 [details]
> dmesg kernel 4.1-rc1 drm.debug=14

There's a 

[   55.128510] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun

but nothing else suspicions that I can see.

(In reply to Tasev Nikola from comment #20)
> Is there anything else that i could do (as average user)to help debug this?

Building and testing kernels, you're not an average user! ;)

Since you're at it, please try the drm-intel-nightly branch of http://cgit.freedesktop.org/drm-intel
Comment 22 Tasev Nikola 2015-05-17 18:07:35 UTC
(In reply to Jani Nikula from comment #21)
> (In reply to Tasev Nikola from comment #19)
> > Created attachment 176061 [details]
> > dmesg kernel 4.1-rc1 drm.debug=14
> 
> There's a 
> 
> [   55.128510] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU
> pipe A FIFO underrun
> 
> but nothing else suspicions that I can see.
> 
> (In reply to Tasev Nikola from comment #20)
> > Is there anything else that i could do (as average user)to help debug this?
> 
> Building and testing kernels, you're not an average user! ;)
> 
> Since you're at it, please try the drm-intel-nightly branch of
> http://cgit.freedesktop.org/drm-intel


Hi,

Just tried drm-intel-nightly and the bug is still there.
Comment 23 Nicholas Narsing 2015-05-31 21:13:45 UTC
Some people have confirmed that disabling IPS (via i915.enable_ips=1) fixes this issue: https://disqus.com/home/discussion/majorio/linux_support_for_the_dell_xps_13_9343_2015_model/#comment-2054346856

Might be worth looking into.
Comment 24 W Unruh 2015-06-01 01:41:15 UTC
I believe you mean 
i915.enable_ips=0
as a grub command line parameter. 
I have not yet tried it, but yes, many reports say it fixes the problem on the Dell XPS13 9343
Comment 25 Nicholas Narsing 2015-06-01 05:08:22 UTC
Yes, that's what I meant; apologies for the typo.
Comment 26 nils.alex 2015-06-01 11:53:55 UTC
i915.enable_ips=0 seems to fix the problem for me (using 4.1-rc6+). What do I miss without ips?
Comment 27 Tasev Nikola 2015-06-01 16:15:28 UTC
i915.enable_ips=0 fix the problem for my Asus UX305FA too.

Thank you.
Comment 28 Ben Widawsky 2015-06-01 18:09:59 UTC
(In reply to nils.alex from comment #26)
> i915.enable_ips=0 seems to fix the problem for me (using 4.1-rc6+). What do
> I miss without ips?

If you're able to get into the higher package C-states, probably not a lot. IPS is an intermediate buffer which allows longer delays between DRAM fetches for the scanout, enabling DDR to sleep when it otherwise may not have been able to.

For higher resolution scanouts, it can be quite important.
Comment 29 Jani Nikula 2015-06-02 07:02:18 UTC
(In reply to Ben Widawsky from comment #28)
> (In reply to nils.alex from comment #26)
> > i915.enable_ips=0 seems to fix the problem for me (using 4.1-rc6+). What do
> > I miss without ips?
> 
> If you're able to get into the higher package C-states, probably not a lot.
> IPS is an intermediate buffer which allows longer delays between DRAM
> fetches for the scanout, enabling DDR to sleep when it otherwise may not
> have been able to.
> 
> For higher resolution scanouts, it can be quite important.

At the higher resolutions, you may hit bug [1]. We shouldn't enable IPS when the pipe pixel rate is greater than 95% of the CDCLK frequency.

I've presumed all or most bugs that get fixed by i915.enable_ips=0 would be that. Maybe we have something else there too. But it seems clear to me there are several different issues mixed up in this report.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=83497
Comment 30 W Unruh 2015-06-02 12:15:30 UTC
I probably do not understand what what ips does, but it seems strange that for me the problem occurs when the terminal display is activated from X. I would think that the terminal ( just white text on a black background) would have a much lower pixel rate than a full fledged X display. Also the terminal works fine before enabling X. It is only when switching out of X that this "loss of Hsync" occurs. I did not see any flickering in X, which seemed to be the problem reported in [1].
Also, this bug seems to have been introduced in 3.18, is not there in 3.19 and resurfaces in 4.0 (but then the i915 thread thread seems to have skipped 3.19.)
It would be great if work were put into this since it is affecting lots of people.
Note the resolution of the screen I was working with was HD (1.9Kx1K), not the super high resolutions.(4Kx2K)
Comment 31 Adrian Bjugård 2015-06-02 14:40:14 UTC
I have this issue on my 2015 13" Retina MacBook Pro as well. It has the i5-5257U processor with Intel Iris 6100 graphics chip. I can confirm that running Linux with the option i915.enable_ips=0 fixes the issue on 4.1rc4 for my computer as well.

Video of the issue in action: https://www.dropbox.com/s/r3masb8hc7rvqtc/MOV_0052.mp4?dl=0
Comment 32 Jani Nikula 2015-06-04 13:29:23 UTC
(In reply to Jani Nikula from comment #29)
> At the higher resolutions, you may hit bug [1]. We shouldn't enable IPS when
> the pipe pixel rate is greater than 95% of the CDCLK frequency.
> 
> I've presumed all or most bugs that get fixed by i915.enable_ips=0 would be
> that. Maybe we have something else there too. But it seems clear to me there
> are several different issues mixed up in this report.
> 
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=83497

This has been fixed by

commit 184e4c49484501f3061ae9b267af818c6894fea9
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Wed Jun 3 15:45:11 2015 +0300

    drm/i915: Don't enable IPS when pixel rate exceeds 95%

in drm-intel-nightly. Please try that, *without* i915.enable_ips parameter, and report back.
Comment 33 Tasev Nikola 2015-06-04 18:30:40 UTC
Hi,

Just tried on my asus UX305FA drm-nightly without the 
i915.enable_ips=0 parameter and it works !

Thank you
Comment 34 soda 2015-06-05 08:34:14 UTC
(In reply to Jani Nikula from comment #32)> 
> This has been fixed by
> 
> commit 184e4c49484501f3061ae9b267af818c6894fea9
> Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Date:   Wed Jun 3 15:45:11 2015 +0300
> 
>     drm/i915: Don't enable IPS when pixel rate exceeds 95%
> 
> in drm-intel-nightly. Please try that, *without* i915.enable_ips parameter,
> and report back.


Confirmed, no issues switching tty anymore (4.1.0-rc6-drm_intel_nightly_20150605, XPS13/FHD).
Comment 35 Jani Nikula 2015-06-05 09:29:24 UTC
Great, thanks for testing! Now, off you go, have a happy weekend. ;)

The rest of you for whom there still is flicker and issues with drm-intel-nightly, please try http://patchwork.freedesktop.org/patch/50675 on top of nightly *without* i915.enable_ips parameter, and report back.