Bug 58361 - [gen4 sdvo regression] connector state mismatch
Summary: [gen4 sdvo regression] connector state mismatch
Status: RESOLVED INSUFFICIENT_DATA
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Daniel Vetter
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-16 18:10 UTC by Rogério Brito
Modified: 2014-08-14 08:18 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.9
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg log of the stack trace (in a git bisect session) (85.76 KB, text/x-log)
2013-05-16 18:10 UTC, Rogério Brito
Details
The log of my git bisect session (2.82 KB, text/x-log)
2013-05-16 18:11 UTC, Rogério Brito
Details
dmesg with boot with drm.debug=0xe (115.64 KB, text/x-log)
2013-06-16 12:25 UTC, Rogério Brito
Details
dmesg from one 3.11-rc kernel with drm.debug=0xe (27.52 KB, application/x-gzip)
2013-08-08 22:17 UTC, Rogério Brito
Details
dmesg from 3.11 kernel with drm.debug=0xe, without CONFIG_INTEL_FB (15.38 KB, application/x-xz)
2013-09-17 23:15 UTC, Rogério Brito
Details
config file for the 3.11 kernel (99.78 KB, text/plain)
2013-09-17 23:16 UTC, Rogério Brito
Details

Description Rogério Brito 2013-05-16 18:10:32 UTC
Created attachment 101761 [details]
dmesg log of the stack trace (in a git bisect session)

Hi.

I have been using the stable lineage of kernel 3.8 and it started giving me stack traces whenever X closes/terminates (this includes logouts or hibernation) and my kernel gets tainted.

I laboriously bisected the problem and it is (surprisingly to me) on a commit by Takashi Iwai on ALSA/hda. This commit was backported to the 3.8 tree and this is why I encountered it on that version (Debian still doesn't have a precompiled 3.9 kernel).

I am attaching the dmesg log and I can provide any kind of information that you think that may be useful for debugging.


Thanks in advance,

Rogério Brito.
Comment 1 Rogério Brito 2013-05-16 18:11:43 UTC
Created attachment 101771 [details]
The log of my git bisect session
Comment 2 Takashi Iwai 2013-05-17 06:18:31 UTC
Are you seeing this in 3.8.y kernel, or 3.9 kernel?  From your description, it sounds like you're seeing the problem in 3.8.y stable kernel while the earlier 3.8 didn't show it.  Or, do you mean that it starts showing with 3.9 while 3.8 didn't?

In either way, simply try to revert the commit you spotted.  It *must* fix the problem, if the bisection were correct.

If you meant 3.8.y regression, you performed the bisection in a wrong way -- it was a bisection between 3.8 and 3.9 kernels.  In this case, you need to bisect 3.8.0 and the latest 3.8.y kernel in linux-stable git repo.
Comment 3 Rogério Brito 2013-05-17 08:43:09 UTC
Dear Takashi,

(In reply to comment #2)
> Are you seeing this in 3.8.y kernel, or 3.9 kernel?  From your description,
> it
> sounds like you're seeing the problem in 3.8.y stable kernel while the
> earlier
> 3.8 didn't show it.  Or, do you mean that it starts showing with 3.9 while
> 3.8
> didn't?

Oops, let me clear the confusion:

* I first saw the problem with Debian's 3.8.y stable kernel, which (obviously) includes the patches that were marked to stable (the case of the patch in question).

* I booted with Ubuntu's precompiled snapshot "mainline" kernels of version 3.10-rc1, which should be pristine as they announce (just to save me the trouble of compilation).

* Then, I grabbed Linus kernel and did the bisecting in *his* kernel and found your commit as the one that broke my system.

> In either way, simply try to revert the commit you spotted.  It *must* fix
> the
> problem, if the bisection were correct.

I reverted, compiled, tested and things seem fixed.

For my tests, I included logging out of X (which was one source of such stack traces) and hibernating (which was another source of the stack traces).

They don't happen anymore with the commit reverted.

I hope that this clears the situation. If you need any other data (dumps, configurations etc.), please let me know.


Thanks for the quick reply,

Rogério Brito.
Comment 4 Takashi Iwai 2013-05-17 09:18:23 UTC
OK, then it's interesting.  I can say that it's just a coincidence that the commit leads to the bug.  Maybe HDMI audio starts getting recognized properly by that fix, and this influenced on the behavior of i915 modeset.  These stack traces are just results of some warnings, and nothing serious.

For further debugging, try to enable the drm debug option, e.g. by passing drm.debug=0x0e at boot time, and get the kernel log until the warning points.  The rest is the job of Intel guys :)
Comment 5 Daniel Vetter 2013-05-20 18:56:23 UTC
Yeah, modeset state checking gone wrong, we need the debug logs ...
Comment 6 Daniel Vetter 2013-06-16 11:33:51 UTC
Ping for the debug logs ... please boot with drm.debug=0xe added to your kernel cmdline, reproduce the issue and then attach the complete dmesg.
Comment 7 Rogério Brito 2013-06-16 12:06:26 UTC
Hi.

On Sun, Jun 16, 2013 at 8:33 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=58361
> --- Comment #6 from Daniel Vetter <daniel@ffwll.ch>  2013-06-16 11:33:51 ---
> Ping for the debug logs ... please boot with drm.debug=0xe added to your
> kernel
> cmdline, reproduce the issue and then attach the complete dmesg.

Thanks for the ping.

One clarification, though: should I boot with that option with the
kernel that shows the stack trace? Or with a kernel that doesn't?
Anything else?


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br
Comment 8 Daniel Vetter 2013-06-16 12:08:21 UTC
We need the debug dmesg from a broken kernel from boot-up up to the first pile of WARNs. The debug output should help us in reconstructing the state and figuring out how things broke ...
Comment 9 Rogério Brito 2013-06-16 12:22:37 UTC
Hi, Daniel.

On Sun, Jun 16, 2013 at 9:08 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=58361
> --- Comment #8 from Daniel Vetter <daniel@ffwll.ch>  2013-06-16 12:08:21 ---
> We need the debug dmesg from a broken kernel from boot-up up to the first
> pile
> of WARNs. The debug output should help us in reconstructing the state and
> figuring out how things broke ...

Grabbing it now. Will attach it in the next post.


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br
Comment 10 Rogério Brito 2013-06-16 12:25:55 UTC
Created attachment 104831 [details]
dmesg with boot with drm.debug=0xe
Comment 11 Rogério Brito 2013-06-16 12:30:41 UTC
Dear Takashi,

(In reply to comment #4)
> OK, then it's interesting.  I can say that it's just a coincidence that the
> commit leads to the bug.  Maybe HDMI audio starts getting recognized properly
> by that fix, and this influenced on the behavior of i915 modeset.  These
> stack
> traces are just results of some warnings, and nothing serious.

Indeed, now that I see this, I can get HDMI audio out, which I couldn't when I wrote https://bugzilla.kernel.org/show_bug.cgi?id=51421

As long as there is no corruption in the kernel data structures, I prefer to have HDMI sound and cope with the stack traces instead of the alternative. :)

> For further debugging, try to enable the drm debug option, e.g. by passing
> drm.debug=0x0e at boot time, and get the kernel log until the warning points. 
> The rest is the job of Intel guys :)

OK, sent it to Daniel Vetter. Let's see if we can have everything fixed, then.


Thank you all,

Rogério Brito.
Comment 12 Daniel Vetter 2013-06-24 17:31:10 UTC
Ah, the connector state mismatch is on an SDVO output. We've had a few funny bugs in there, can you please retest with latest 3.10-rc kernels and if it's still broken, attach a new drm.debug=0xe dmesg?
Comment 13 Jani Nikula 2013-08-08 09:48:43 UTC
(In reply to Daniel Vetter from comment #12)
> Ah, the connector state mismatch is on an SDVO output. We've had a few funny
> bugs in there, can you please retest with latest 3.10-rc kernels and if it's
> still broken, attach a new drm.debug=0xe dmesg?

Rogério, please try the latest 3.11-rc kernels, and if it's still broken, attach dmesg with drm.debug=0xe module parameter. Thank you.
Comment 14 Rogério Brito 2013-08-08 10:23:30 UTC
Hi, Jani.

On Thu, Aug 8, 2013 at 6:48 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=58361
>
> --- Comment #13 from Jani Nikula <jani.nikula@intel.com> ---
> (In reply to Daniel Vetter from comment #12)
>> Ah, the connector state mismatch is on an SDVO output. We've had a few funny
>> bugs in there, can you please retest with latest 3.10-rc kernels and if it's
>> still broken, attach a new drm.debug=0xe dmesg?
>
> Rogério, please try the latest 3.11-rc kernels, and if it's still broken,
> attach dmesg with drm.debug=0xe module parameter. Thank you.

OK, I'm heading to bed right now, but as soon as I wake up I will report back.


Thanks,
Comment 15 Rogério Brito 2013-08-08 22:17:31 UTC
Created attachment 107160 [details]
dmesg from one 3.11-rc kernel with drm.debug=0xe

Hi,

As promised, attached is the complete dmesg that I get when I boot with the precompiled kernel by Ubuntu, showing the stack trace (starting at about 98 seconds).

This was taken from

http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2013-08-08-saucy/

And according to the contents of that directory, it seems that the top commit is: b7bc9e7d808ba55729bd263b0210cda36965be32.

Is there any further information that I can provide? Any other trees that I may test to fix this issue? It would be superb if a patch like this could be backported to the -stable kernels.


Thanks,

Rogério Brito.
Comment 16 Jani Nikula 2013-08-09 07:30:39 UTC
SDVO state tracking for Daniel to figure out now that we have the dmesg.
Comment 17 Jani Nikula 2013-08-09 07:40:01 UTC
FWIW I don't think the bisect result below is plausible, therefore dropping "bisected" from subject.

commit ea9b43addc4d90ca5b029f47f85ca152320a1e8d
Author: Takashi Iwai <tiwai@suse.de>
Date:   Tue Feb 12 17:02:41 2013 +0100

    ALSA: hda - Fix broken workaround for HDMI/SPDIF conflicts
Comment 18 Daniel Vetter 2013-08-09 08:37:33 UTC
Hm, is there any other bad side-effects than the backtraces?
Comment 19 Rogério Brito 2013-08-09 08:47:21 UTC
Hi, Daniel.

On Fri, Aug 9, 2013 at 5:37 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> --- Comment #18 from Daniel Vetter <daniel@ffwll.ch> ---
> Hm, is there any other bad side-effects than the backtraces?

Not that I have noticed (read: I'm only coding right now and I don't
see any problems). Of course, I can try a more extensive test trying
to attach the notebook to an HDMI TV and to attach it to an analog/VGA
monitor and send you the results.


Thanks for the feedback,
Comment 20 Daniel Vetter 2013-08-09 09:15:00 UTC
[   10.404467] intelfb: Framebuffer driver for Intel(R) 830M/845G/852GM/855GM/865G/915G/915GM/945G/945GM/945GME/965G/965GM chipsets

I do not know how you actually managed to pull this off, but you have two display drivers fighting over the same piece of hw. That tends to end up in tears.

Can you please retest with CONFIG_FB_INTEL=n ?
Comment 21 Rogério Brito 2013-08-09 09:48:16 UTC
Hi, Daniel.

On Fri, Aug 9, 2013 at 6:15 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=58361
>
> --- Comment #20 from Daniel Vetter <daniel@ffwll.ch> ---
> [   10.404467] intelfb: Framebuffer driver for Intel(R)
> 830M/845G/852GM/855GM/865G/915G/915GM/945G/945GM/945GME/965G/965GM chipsets
>
> I do not know how you actually managed to pull this off, but you have two
> display drivers fighting over the same piece of hw. That tends to end up in
> tears.

I am going to bed right now, but I can say that I only downloaded
Ubuntu's precompiled kernels. I also get the same behavior with
Debian's kernels.

It's been a long, long, long time since I compiled my own kernels for
my notebooks/desktops (apart from specialty/embedded systems), which
is to say that what I am simply getting what the popular distributions
offer.

> Can you please retest with CONFIG_FB_INTEL=n ?

Sure. I will report back after I take some sleep.


Thanks a lot as usual,
Comment 22 Jani Nikula 2013-09-09 11:53:40 UTC
(In reply to Rogério Brito from comment #21)
> Sure. I will report back after I take some sleep.

Ping for the info; hope you had a good sleep. ;)
Comment 23 Rogério Brito 2013-09-17 23:13:39 UTC
(In reply to Jani Nikula from comment #22)
> (In reply to Rogério Brito from comment #21)
> > Sure. I will report back after I take some sleep.
> 
> Ping for the info; hope you had a good sleep. ;)

Hi there. Thanks for the ping.

I recompiled the kernel (version 3.11) and I still get the messages. I am attaching the dmesg log to this bug report. I will also post the config file that I used, so that you people can guide me in further steps (which I am sure will be needed).

Thanks a lot,

Rogério.
Comment 24 Rogério Brito 2013-09-17 23:15:20 UTC
Created attachment 108761 [details]
dmesg from 3.11 kernel with drm.debug=0xe, without CONFIG_INTEL_FB
Comment 25 Rogério Brito 2013-09-17 23:16:09 UTC
Created attachment 108771 [details]
config file for the 3.11 kernel
Comment 26 Rogério Brito 2013-11-29 10:28:39 UTC
ping.
Comment 27 Daniel Vetter 2013-11-29 15:54:54 UTC
We still seem to have a fairly hard time getting a consistent state ouf of the sdvo controller. Can you please retest with latest drm-intel-testing from

http://cgit.freedesktop.org/~danvet/drm-intel/

to make sure we're not falling over an already fixed bug?

It's not a multifunc sdvo chip - those are known to be a bit broken with our code ...
Comment 28 Jani Nikula 2014-08-14 08:18:08 UTC
Timeout. Please reopen if the problem persists with recent kernels.

Note You need to log in before you can comment on or make changes to this bug.