Created attachment 101761 [details] dmesg log of the stack trace (in a git bisect session) Hi. I have been using the stable lineage of kernel 3.8 and it started giving me stack traces whenever X closes/terminates (this includes logouts or hibernation) and my kernel gets tainted. I laboriously bisected the problem and it is (surprisingly to me) on a commit by Takashi Iwai on ALSA/hda. This commit was backported to the 3.8 tree and this is why I encountered it on that version (Debian still doesn't have a precompiled 3.9 kernel). I am attaching the dmesg log and I can provide any kind of information that you think that may be useful for debugging. Thanks in advance, Rogério Brito.
Created attachment 101771 [details] The log of my git bisect session
Are you seeing this in 3.8.y kernel, or 3.9 kernel? From your description, it sounds like you're seeing the problem in 3.8.y stable kernel while the earlier 3.8 didn't show it. Or, do you mean that it starts showing with 3.9 while 3.8 didn't? In either way, simply try to revert the commit you spotted. It *must* fix the problem, if the bisection were correct. If you meant 3.8.y regression, you performed the bisection in a wrong way -- it was a bisection between 3.8 and 3.9 kernels. In this case, you need to bisect 3.8.0 and the latest 3.8.y kernel in linux-stable git repo.
Dear Takashi, (In reply to comment #2) > Are you seeing this in 3.8.y kernel, or 3.9 kernel? From your description, > it > sounds like you're seeing the problem in 3.8.y stable kernel while the > earlier > 3.8 didn't show it. Or, do you mean that it starts showing with 3.9 while > 3.8 > didn't? Oops, let me clear the confusion: * I first saw the problem with Debian's 3.8.y stable kernel, which (obviously) includes the patches that were marked to stable (the case of the patch in question). * I booted with Ubuntu's precompiled snapshot "mainline" kernels of version 3.10-rc1, which should be pristine as they announce (just to save me the trouble of compilation). * Then, I grabbed Linus kernel and did the bisecting in *his* kernel and found your commit as the one that broke my system. > In either way, simply try to revert the commit you spotted. It *must* fix > the > problem, if the bisection were correct. I reverted, compiled, tested and things seem fixed. For my tests, I included logging out of X (which was one source of such stack traces) and hibernating (which was another source of the stack traces). They don't happen anymore with the commit reverted. I hope that this clears the situation. If you need any other data (dumps, configurations etc.), please let me know. Thanks for the quick reply, Rogério Brito.
OK, then it's interesting. I can say that it's just a coincidence that the commit leads to the bug. Maybe HDMI audio starts getting recognized properly by that fix, and this influenced on the behavior of i915 modeset. These stack traces are just results of some warnings, and nothing serious. For further debugging, try to enable the drm debug option, e.g. by passing drm.debug=0x0e at boot time, and get the kernel log until the warning points. The rest is the job of Intel guys :)
Yeah, modeset state checking gone wrong, we need the debug logs ...
Ping for the debug logs ... please boot with drm.debug=0xe added to your kernel cmdline, reproduce the issue and then attach the complete dmesg.
Hi. On Sun, Jun 16, 2013 at 8:33 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=58361 > --- Comment #6 from Daniel Vetter <daniel@ffwll.ch> 2013-06-16 11:33:51 --- > Ping for the debug logs ... please boot with drm.debug=0xe added to your > kernel > cmdline, reproduce the issue and then attach the complete dmesg. Thanks for the ping. One clarification, though: should I boot with that option with the kernel that shows the stack trace? Or with a kernel that doesn't? Anything else? Thanks, -- Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br
We need the debug dmesg from a broken kernel from boot-up up to the first pile of WARNs. The debug output should help us in reconstructing the state and figuring out how things broke ...
Hi, Daniel. On Sun, Jun 16, 2013 at 9:08 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=58361 > --- Comment #8 from Daniel Vetter <daniel@ffwll.ch> 2013-06-16 12:08:21 --- > We need the debug dmesg from a broken kernel from boot-up up to the first > pile > of WARNs. The debug output should help us in reconstructing the state and > figuring out how things broke ... Grabbing it now. Will attach it in the next post. Thanks, -- Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br
Created attachment 104831 [details] dmesg with boot with drm.debug=0xe
Dear Takashi, (In reply to comment #4) > OK, then it's interesting. I can say that it's just a coincidence that the > commit leads to the bug. Maybe HDMI audio starts getting recognized properly > by that fix, and this influenced on the behavior of i915 modeset. These > stack > traces are just results of some warnings, and nothing serious. Indeed, now that I see this, I can get HDMI audio out, which I couldn't when I wrote https://bugzilla.kernel.org/show_bug.cgi?id=51421 As long as there is no corruption in the kernel data structures, I prefer to have HDMI sound and cope with the stack traces instead of the alternative. :) > For further debugging, try to enable the drm debug option, e.g. by passing > drm.debug=0x0e at boot time, and get the kernel log until the warning points. > The rest is the job of Intel guys :) OK, sent it to Daniel Vetter. Let's see if we can have everything fixed, then. Thank you all, Rogério Brito.
Ah, the connector state mismatch is on an SDVO output. We've had a few funny bugs in there, can you please retest with latest 3.10-rc kernels and if it's still broken, attach a new drm.debug=0xe dmesg?
(In reply to Daniel Vetter from comment #12) > Ah, the connector state mismatch is on an SDVO output. We've had a few funny > bugs in there, can you please retest with latest 3.10-rc kernels and if it's > still broken, attach a new drm.debug=0xe dmesg? Rogério, please try the latest 3.11-rc kernels, and if it's still broken, attach dmesg with drm.debug=0xe module parameter. Thank you.
Hi, Jani. On Thu, Aug 8, 2013 at 6:48 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=58361 > > --- Comment #13 from Jani Nikula <jani.nikula@intel.com> --- > (In reply to Daniel Vetter from comment #12) >> Ah, the connector state mismatch is on an SDVO output. We've had a few funny >> bugs in there, can you please retest with latest 3.10-rc kernels and if it's >> still broken, attach a new drm.debug=0xe dmesg? > > Rogério, please try the latest 3.11-rc kernels, and if it's still broken, > attach dmesg with drm.debug=0xe module parameter. Thank you. OK, I'm heading to bed right now, but as soon as I wake up I will report back. Thanks,
Created attachment 107160 [details] dmesg from one 3.11-rc kernel with drm.debug=0xe Hi, As promised, attached is the complete dmesg that I get when I boot with the precompiled kernel by Ubuntu, showing the stack trace (starting at about 98 seconds). This was taken from http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2013-08-08-saucy/ And according to the contents of that directory, it seems that the top commit is: b7bc9e7d808ba55729bd263b0210cda36965be32. Is there any further information that I can provide? Any other trees that I may test to fix this issue? It would be superb if a patch like this could be backported to the -stable kernels. Thanks, Rogério Brito.
SDVO state tracking for Daniel to figure out now that we have the dmesg.
FWIW I don't think the bisect result below is plausible, therefore dropping "bisected" from subject. commit ea9b43addc4d90ca5b029f47f85ca152320a1e8d Author: Takashi Iwai <tiwai@suse.de> Date: Tue Feb 12 17:02:41 2013 +0100 ALSA: hda - Fix broken workaround for HDMI/SPDIF conflicts
Hm, is there any other bad side-effects than the backtraces?
Hi, Daniel. On Fri, Aug 9, 2013 at 5:37 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > --- Comment #18 from Daniel Vetter <daniel@ffwll.ch> --- > Hm, is there any other bad side-effects than the backtraces? Not that I have noticed (read: I'm only coding right now and I don't see any problems). Of course, I can try a more extensive test trying to attach the notebook to an HDMI TV and to attach it to an analog/VGA monitor and send you the results. Thanks for the feedback,
[ 10.404467] intelfb: Framebuffer driver for Intel(R) 830M/845G/852GM/855GM/865G/915G/915GM/945G/945GM/945GME/965G/965GM chipsets I do not know how you actually managed to pull this off, but you have two display drivers fighting over the same piece of hw. That tends to end up in tears. Can you please retest with CONFIG_FB_INTEL=n ?
Hi, Daniel. On Fri, Aug 9, 2013 at 6:15 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=58361 > > --- Comment #20 from Daniel Vetter <daniel@ffwll.ch> --- > [ 10.404467] intelfb: Framebuffer driver for Intel(R) > 830M/845G/852GM/855GM/865G/915G/915GM/945G/945GM/945GME/965G/965GM chipsets > > I do not know how you actually managed to pull this off, but you have two > display drivers fighting over the same piece of hw. That tends to end up in > tears. I am going to bed right now, but I can say that I only downloaded Ubuntu's precompiled kernels. I also get the same behavior with Debian's kernels. It's been a long, long, long time since I compiled my own kernels for my notebooks/desktops (apart from specialty/embedded systems), which is to say that what I am simply getting what the popular distributions offer. > Can you please retest with CONFIG_FB_INTEL=n ? Sure. I will report back after I take some sleep. Thanks a lot as usual,
(In reply to Rogério Brito from comment #21) > Sure. I will report back after I take some sleep. Ping for the info; hope you had a good sleep. ;)
(In reply to Jani Nikula from comment #22) > (In reply to Rogério Brito from comment #21) > > Sure. I will report back after I take some sleep. > > Ping for the info; hope you had a good sleep. ;) Hi there. Thanks for the ping. I recompiled the kernel (version 3.11) and I still get the messages. I am attaching the dmesg log to this bug report. I will also post the config file that I used, so that you people can guide me in further steps (which I am sure will be needed). Thanks a lot, Rogério.
Created attachment 108761 [details] dmesg from 3.11 kernel with drm.debug=0xe, without CONFIG_INTEL_FB
Created attachment 108771 [details] config file for the 3.11 kernel
ping.
We still seem to have a fairly hard time getting a consistent state ouf of the sdvo controller. Can you please retest with latest drm-intel-testing from http://cgit.freedesktop.org/~danvet/drm-intel/ to make sure we're not falling over an already fixed bug? It's not a multifunc sdvo chip - those are known to be a bit broken with our code ...
Timeout. Please reopen if the problem persists with recent kernels.