Bug 22122
Summary: | pty lockup at kernel/workqueue.c:1180 | ||
---|---|---|---|
Product: | Drivers | Reporter: | James Cloos (cloos) |
Component: | Other | Assignee: | Tejun Heo (tj) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | alan, arnd, tj |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | trunk | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
wq-debug.patch
wq-debug-1.patch Dmesg fro gg8b16d + the patch from attachment #2 wq-debug-2.patch |
Description
James Cloos
2010-11-04 20:02:01 UTC
Yeah, I tried to root-cause it but haven't been successful yet. Is there any way to reproduce it? Under what conditions does this happen and how often? Thanks. > Yeah, I tried to root-cause it but haven't been successful yet. > Is there any way to reproduce it? Under what conditions does > this happen and how often? With kernel ff8b16d7e15a it occurs every boot after just a few minutes. Box is fam10; dist is gentoo ~amd64; gcc is 4.5.1 with the graphite and lto support compiled in (which requires http://www.cs.unipr.it/ppl/ and http://repo.or.cz/w/cloog-ppl.git). (Gentoo does apply some patches to 4.5.1; I believe all from the 4.5 branch.) I have "console=ttyS0,115200n8r" in the command line, and agetty(8) also runs on ttyS0. I cannot confirm that it works, though; my laptop’s serial port seems to be kaput. (The fam10 is intended as a headless compute node; I use the laptop as an X server.) I stuck the current config at http://jhcloos.com/t/fam10.config.xz It has a bunch of speculative enables and probably a few useless ones; I haven’t confirmed the need for everything in it…. Thanks for the input. Hmmm... I tried to reproduce it but haven't been successful yet. It's weird that the other reported case was also related with tty code. Well, at least you can reproduce it somewhat reliably, so that's good. I'm preparing a debug patch. Will post it soon. Created attachment 37132 [details]
wq-debug.patch
Can you please apply this patch, trigger the problem and attach full log?
Thank you.
Created attachment 37142 [details]
wq-debug-1.patch
Oops, forgot something. Please use this one.
I will test wq-debug-1.patch later today or tonight. Testing proved more difficult than expected. When I added your patch to what had been the most recent version of the kernel I had previously tested, the problem did not occur. I have CONFIG_LOCALVERSION_AUTO=y in that kernel, though, so adding the patch added '-dirty' to the kernel version; that, of course, caused many files to recompile. This means that the bug may be compiler-specific, or it may be a more typical heisenbug. I also tried adding it the tip, but that version didn't work at all. (The serial console bug.) I want to test a more recent tip, but need to find a safe way to do so which does not require a console. Hefting it between here and the TV room is a drag. (Said TV is the only available monitor, and I lack a dekametre hdmi cable.) -JimC Did the kernel trigger any warning messages and stack dumps with the patch applied? If so, can you please attach full kernel log? > Did the kernel trigger any warning messages and stack dumps with the patch
> applied? If so, can you please attach full kernel log?
There wasn't a lockup, but looking at the dmesg dumps again I see that
it did output some call traces. Compressed attachment to follow.
Created attachment 37602 [details] Dmesg fro gg8b16d + the patch from attachment #2 [details] I was unable to get a dmesg from the then-tip with the patch; that crashed too soon because of the serial-console-related bug. Production needs will keep me from testing current tip for a while. Heh, that's interesting. How does the counter go off without triggering the running state sanity check? Weird. I'll prep another debug patch soon. Thank you. Created attachment 37672 [details]
wq-debug-2.patch
Can you please try this patch and report the kernel warnings? Also, please turn on the printk timestamp.
Thank you.
I’ll give wq-debug-2.patch a try. It probably won’t be until the weekend, though. Can you please attach .config? Let's see if I can reproduce it. Thanks. For some reason, James' message couldn't be committed to bugzilla. Forwarding...
James Cloos wrote:
> > Before trying the last patch I thought I should give unpatched tip
> > another try.
> >
> > As it turned out that was the last commit before the rc3 tag.
> >
> > There were, however, three other variables.
> >
> > The gcc-4.5.1 ebuild was updated with a new patchset (mostly taken from
> > upsteam svn, akin to the other dists) and I removed the serial console
> > invocation from the command line. There is little point of it given
> > that the laptop’s serial port refuses to work.
> >
> > I also did a make clean, before rebuilding, just in case.
> >
> > I cannot get this compile to generate the lockup.
> >
> > At this point, I’m leaning towards closing this as a compiler error,
> > but i first should test with serial console, just to be sure.
|