I personally am not a kernel programmer however when I showed the attached trace to Brad Spengler he suggested I file a report, he also had this to say:
"read_buf needs to be properly synchronized -- right now, only read_head
is synchronized with a spinlock. What's happening here is that spinlock is getting held and released in n_tty_flush_buffer() from n_tty_close(), then the read happens (getting past the NULL check, as read_buf hasn't been NULLed yet, though could be in the process of being freed at this point). Then read_buf gets set to NULL and the tty->read_buf[tty_read_head] = c executes, causing the NULL deref."
I added Joe Peterson as a cc because he was listed by get_maintainer.pl alongside Alan Cox, who to my understanding is not working on tty anymore.
220.127.116.11 from git e2984cbfddd5c8fac88b24d7e5f28e1cfb6f3838
aufs2-2.6/aufs2-31 from git 122bf939bcc705083bd5a43defcd0c3249f7fe88
I've tried to be complete however this is my first filing on the kernel bugzilla so if I forgot to include something kindly let me know and I'll get back to you.
Created attachment 23786 [details]
raw trace from serial console buffer
Created attachment 23787 [details]
Decoded trace file
hm, we don't have tty or ldisc categories in bugzilla, so I reassigned this to drivers/serial.
I assume this is really a regression.
I wonder if the bug is still present in 2.6.32.
Perhaps you can persuade Mr Spengler to send us a patch ;)
I think he already has a patch right here:
#4 is nothing at all related
The bug looks like a regression, possibly added by some chap called Torvalds when he hacked up the n_tty stuff to call back into the ldisc on read ;) Brad's diagnosis looks odd though - the ldisc is locked down at that point so won't get closed (a set_ldisc would stall)
Assigned to Greg the tty maintainer.
I've caught this happening one or two more times since filing the bug, traces are available if needed. Brad Spengler has written a patch, If Alan/Greg could review it and make comments it would be greatly appreciated.
Created attachment 24235 [details]
n_tty.c patch provided by Brad Spengler.
I already did
We fix bugs by understanding them not by randomly applying gunge that tries to hide it.
I wasn't aware that you had already reviewed the patch, thanks for looking Alan.
I had read it as well, if someone wants a patch reviewed, please have them submit it in the proper format as documented in the file, Documentation/SubmittingPatches so that it could be applied.
That includes a description of what the patch does, and why it does it.
Also, as Alan said, the patch just looks like it papers over the real problem, which
While trying to write a Python program using pexpect to communicate with a subprocess I found a testcase that triggers this bug reliably. It takes only two or three attempts to hit the NULL pointer dereference in put_tty_queue().
I don't have a serial console, but took photos of the call trace:
This is on a Core i7-920 running Fedora 12 kernel 18.104.22.168-174.2.3.fc12.x86_64.
I don't know anything about the tty subsystem but I'd be happy to help in debugging this. Any special logging that I should turn on? Patches to try? Or do I have to git-bisect until I find the offender?
Can you try duplicating this on the 22.214.171.124 kernel release?
Kernel 126.96.36.199 (more precisely, 188.8.131.52-36.fc12.x86_64) indeed fixes it!
I had tried 184.108.40.206 (oopses), but I wasn't aware that 2.6.32 is out.
Great, thanks for testing, marking closed now.