Most recent kernel where this bug did not occur: UNKNOWN
Hardware Environment: x86
Software Environment: n/a
Problem Description: kernel oopses or crashes can occur in the
n_r3964 line discipline when kmalloc(..GFP_KERNEL..) is invoked
in code paths which run in interrupt contexts. I suspect two
places of kmalloc where GFP_KERNEL should be replaced by GFP_ATOMIC:
1. function on_receive_block after the
comment "/* prepare struct r3964_block_header */"
2. function add_msg after the label "queue_the_message"
Steps to reproduce: contact email@example.com for test prog.
Created attachment 8269 [details]
n_r3964, use GPF_ATOMIC in some kmalloc()s
Does that patch fix everything up?
If so, then would I be correct in believing that the "oopses" were in
fact "sleeping function called from invalid context" warnings?
Well I don't understand the on_receive_block() change - as far as I can
tell GFP_KERNEL is legal there. At least, when the caller is from tty_io.c
It would really help to understand this problem if we could see the dmesg
output showing the call stack up to these two sites, please.
Created attachment 8277 [details]
2 oopses from /var/log/messages
These oopses occured on FC5 2.6.16-1.2122_FC5 i686
when two serial lines with n_r3964 where run against
each other using a null modem cable some seconds
after breaking the serial connection.
OK, thanks. add_msg() is called from timer handler.
But what about on_receive_block()?
I think the one in on_receive_block() shouldn't be necessary. Even with
tty->low_latency set, we should never end up here in a context where we can't
sleep... I think.
The other change (in add_msg()) means that we're allocating with GFP_ATOMIC even
in the _common_ case, for received data packets.
Which is worse? Using GFP_ATOMIC when we don't need to, or something ugly like
pMsg = kmalloc(sizeof(struct r3964_message),
GFP_ATOMIC is unreliable. Given a choice between ugly and unreliable
I guess we get ugly.
Created attachment 8279 [details]
Patch to use GFP_ATOMIC conditionally
Agreed. Let's do it like this then.
Christian, please could you confirm that this also fixes the problem? And that
you've never actually seen the problem occur in on_receive_block()?
The attachment (id=8279) cured my problem.
Tested with two pl2303 USB-serial converters
against each other, ran for 30 minutes with
null modem plugged and sometimes unplugged.
No further oopses were observed.
Thanks for the confirmation.