Bug 20732 - Kernel locks up in macb interrupt handler on a flooded network
Kernel locks up in macb interrupt handler on a flooded network
Status: RESOLVED CODE_FIX
Product: Drivers
Classification: Unclassified
Component: Network
All Linux
: P1 normal
Assigned To: Nicolas Ferre
:
: 20492 20502 20512 20522 20532 20542 20552 20562 20572 20582 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-10-18 20:21 UTC by joshua.hoke
Modified: 2012-08-14 11:15 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.27.x, 2.6.36-rc8
Tree: Mainline
Regression: No


Attachments
Reporter's patch (4.68 KB, patch)
2010-10-18 20:21 UTC, joshua.hoke
Details | Diff
E-mail with patch (2.90 KB, text/plain)
2010-10-25 11:38 UTC, joshua.hoke
Details

Description joshua.hoke 2010-10-18 20:21:19 UTC
Created attachment 33982 [details]
Reporter's patch

On a busy network, the macb driver could get stuck in the interrupt handler, quickly triggering the watchdog, due to a confluence of factors:

 1. macb_poll re-enables interrupts unconditionally, even when it will be called again because it exhausted its rx budget
 2. macb_interrupt only disabled interrupts when it could scheduled macb_poll, but scheduling fails when macb_poll is already scheduled because it didn't call napi_complete
 3. macb_interrupt loops until the interrupt status register is clear, which will never happen if it doesn't disable the RX interrupt

Since macb_interrupt runs in interrupt context, this effectively locks up the machine, triggering the watchdog timer. (The RT kernel may have different behavior, but it's still a bug in the driver.)

This issue was readily reproducible on a flooded network with the 2.6.27.48 kernel. The same problems appear to still be in 2.6.36-rc8, so I am submitting this bug report.

The attached patches may need some cleanup but fix the problem for me. (The second one fixes a theoretical problem with the first.)
Comment 1 Andrew Morton 2010-10-18 23:27:51 UTC
Please don't send patches via bugzilla - it causes lots of problems with
our usual patch management and review processes.

Please send this patch via email as per Documentation/SubmittingPatches. 
Suitable recipients may be found via scripts/get_maintainer.pl.  Please
also cc myself on the email.

Thanks.
Comment 2 joshua.hoke 2010-10-19 16:54:14 UTC
I have now done so.
Comment 3 joshua.hoke 2010-10-25 11:38:05 UTC
Created attachment 34972 [details]
E-mail with patch 

David Miller reported that the e-mail I sent was broken. I will try sending it again but in case the Outlook hint I found doesn't work (since even the first one looks fine in my Sent folder) I am attaching the contents here.
Comment 4 kernelbugs 2010-11-18 16:32:59 UTC
Bugs 20492, 20502, 20512, 20522, 20532, 20542, 20552, 20562, 20572, and 20582 are all (I believe) duplicates of this one, but I don't see how to mark them that way.
Comment 5 Nicolas Ferre 2010-11-18 17:04:43 UTC
*** Bug 20492 has been marked as a duplicate of this bug. ***
Comment 6 Nicolas Ferre 2010-11-18 17:05:27 UTC
*** Bug 20502 has been marked as a duplicate of this bug. ***
Comment 7 Nicolas Ferre 2010-11-18 17:06:07 UTC
*** Bug 20512 has been marked as a duplicate of this bug. ***
Comment 8 Nicolas Ferre 2010-11-18 17:06:58 UTC
*** Bug 20522 has been marked as a duplicate of this bug. ***
Comment 9 Nicolas Ferre 2010-11-18 17:08:52 UTC
*** Bug 20532 has been marked as a duplicate of this bug. ***
Comment 10 Nicolas Ferre 2010-11-18 17:09:40 UTC
*** Bug 20542 has been marked as a duplicate of this bug. ***
Comment 11 Nicolas Ferre 2010-11-18 17:11:32 UTC
*** Bug 20552 has been marked as a duplicate of this bug. ***
Comment 12 Nicolas Ferre 2010-11-18 17:12:12 UTC
*** Bug 20562 has been marked as a duplicate of this bug. ***
Comment 13 Nicolas Ferre 2010-11-18 17:12:51 UTC
*** Bug 20572 has been marked as a duplicate of this bug. ***
Comment 14 Nicolas Ferre 2010-11-18 17:13:20 UTC
*** Bug 20582 has been marked as a duplicate of this bug. ***
Comment 15 Nicolas Ferre 2010-11-18 17:14:37 UTC
(In reply to comment #4)
> Bugs 20492, 20502, 20512, 20522, 20532, 20542, 20552, 20562, 20572, and 20582
> are all (I believe) duplicates of this one, but I don't see how to mark them
> that way.

I have done it. (it is huge! ;-) )
Comment 16 kernelbugs 2010-11-18 17:49:02 UTC
I believe the duplicates came when I submitted the bug; I kept trying and getting an SQL error page from Bugzilla, but it must have filed them anyway. Oops!

I will also note that Dave Miller committed the fix above as:

http://git.kernel.org/linus/b336369c1e1ad88495895260a9068eb18bc48b6c

and it is in v2.6.37-rc1.

Note You need to log in before you can comment on or make changes to this bug.