Bug 6659

Summary: n_r3964 line discipline: kernel oopses in time-out situations
Product: Drivers Reporter: Christian Werner (chw)
Component: SerialAssignee: David Woodhouse (dwmw2)
Status: RESOLVED CODE_FIX    
Severity: normal CC: akpm
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: all 2.6 Subsystem:
Regression: --- Bisected commit-id:
Attachments: n_r3964, use GPF_ATOMIC in some kmalloc()s
2 oopses from /var/log/messages
Patch to use GFP_ATOMIC conditionally

Description Christian Werner 2006-06-07 00:37:56 UTC
Most recent kernel where this bug did not occur: UNKNOWN

Distribution: any

Hardware Environment: x86

Software Environment: n/a

Problem Description: kernel oopses or crashes can occur in the
n_r3964 line discipline when kmalloc(..GFP_KERNEL..) is invoked
in code paths which run in interrupt contexts. I suspect two
places of kmalloc where GFP_KERNEL should be replaced by GFP_ATOMIC:

1. function on_receive_block after the
   comment "/* prepare struct r3964_block_header */"
2. function add_msg after the label "queue_the_message"

Steps to reproduce: contact chw@ch-werner.de for test prog.
Comment 1 Christian Werner 2006-06-07 01:51:40 UTC
Created attachment 8269 [details]
n_r3964, use GPF_ATOMIC in some kmalloc()s
Comment 2 Andrew Morton 2006-06-08 23:30:01 UTC
Does that patch fix everything up?

If so, then would I be correct in believing that the "oopses" were in
fact "sleeping function called from invalid context" warnings?

Thanks.
Comment 3 Andrew Morton 2006-06-08 23:38:06 UTC
Well I don't understand the on_receive_block() change - as far as I can
tell GFP_KERNEL is legal there.  At least, when the caller is from tty_io.c

It would really help to understand this problem if we could see the dmesg
output showing the call stack up to these two sites, please.
Comment 4 Christian Werner 2006-06-08 23:49:28 UTC
Created attachment 8277 [details]
2 oopses from /var/log/messages

These oopses occured on FC5 2.6.16-1.2122_FC5 i686
when two serial lines with n_r3964 where run against
each other using a null modem cable some seconds
after breaking the serial connection.
Comment 5 Andrew Morton 2006-06-08 23:57:24 UTC
OK, thanks.  add_msg() is called from timer handler.

But what about on_receive_block()?
Comment 6 David Woodhouse 2006-06-09 03:37:48 UTC
I think the one in on_receive_block() shouldn't be necessary. Even with
tty->low_latency set, we should never end up here in a context where we can't
sleep... I think. 

The other change (in add_msg()) means that we're allocating with GFP_ATOMIC even
in the _common_ case, for received data packets. 

Which is worse? Using GFP_ATOMIC when we don't need to, or something ugly like
this...

      pMsg = kmalloc(sizeof(struct r3964_message),
                     (arg==R3964_MSG_DATA)?GFP_KERNEL:GFP_ATOMIC);

Comment 7 Andrew Morton 2006-06-09 03:43:25 UTC
GFP_ATOMIC is unreliable.  Given a choice between ugly and unreliable
I guess we get ugly.

Comment 8 David Woodhouse 2006-06-09 03:57:54 UTC
Created attachment 8279 [details]
Patch to use GFP_ATOMIC conditionally

Agreed. Let's do it like this then.

Christian, please could you confirm that this also fixes the problem? And that
you've never actually seen the problem occur in on_receive_block()?
Comment 9 Christian Werner 2006-06-09 14:13:29 UTC
The attachment (id=8279) cured my problem.
Tested with two pl2303 USB-serial converters
against each other, ran for 30 minutes with
null modem plugged and sometimes unplugged.
No further oopses were observed.
Comment 10 David Woodhouse 2006-06-09 15:05:05 UTC
Thanks for the confirmation.