Bug 20732

Summary: Kernel locks up in macb interrupt handler on a flooded network
Product: Drivers Reporter: joshua.hoke
Component: NetworkAssignee: Deleted User (devnull+deleted_20230328)
Status: RESOLVED CODE_FIX    
Severity: normal CC: akpm, alan, kernelbugs
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27.x, 2.6.36-rc8 Subsystem:
Regression: No Bisected commit-id:
Attachments: Reporter's patch
E-mail with patch

Description joshua.hoke 2010-10-18 20:21:19 UTC
Created attachment 33982 [details]
Reporter's patch

On a busy network, the macb driver could get stuck in the interrupt handler, quickly triggering the watchdog, due to a confluence of factors:

 1. macb_poll re-enables interrupts unconditionally, even when it will be called again because it exhausted its rx budget
 2. macb_interrupt only disabled interrupts when it could scheduled macb_poll, but scheduling fails when macb_poll is already scheduled because it didn't call napi_complete
 3. macb_interrupt loops until the interrupt status register is clear, which will never happen if it doesn't disable the RX interrupt

Since macb_interrupt runs in interrupt context, this effectively locks up the machine, triggering the watchdog timer. (The RT kernel may have different behavior, but it's still a bug in the driver.)

This issue was readily reproducible on a flooded network with the 2.6.27.48 kernel. The same problems appear to still be in 2.6.36-rc8, so I am submitting this bug report.

The attached patches may need some cleanup but fix the problem for me. (The second one fixes a theoretical problem with the first.)
Comment 1 Andrew Morton 2010-10-18 23:27:51 UTC
Please don't send patches via bugzilla - it causes lots of problems with
our usual patch management and review processes.

Please send this patch via email as per Documentation/SubmittingPatches. 
Suitable recipients may be found via scripts/get_maintainer.pl.  Please
also cc myself on the email.

Thanks.
Comment 2 joshua.hoke 2010-10-19 16:54:14 UTC
I have now done so.
Comment 3 joshua.hoke 2010-10-25 11:38:05 UTC
Created attachment 34972 [details]
E-mail with patch 

David Miller reported that the e-mail I sent was broken. I will try sending it again but in case the Outlook hint I found doesn't work (since even the first one looks fine in my Sent folder) I am attaching the contents here.
Comment 4 kernelbugs 2010-11-18 16:32:59 UTC
Bugs 20492, 20502, 20512, 20522, 20532, 20542, 20552, 20562, 20572, and 20582 are all (I believe) duplicates of this one, but I don't see how to mark them that way.
Comment 5 Deleted User 2010-11-18 17:04:43 UTC
*** Bug 20492 has been marked as a duplicate of this bug. ***
Comment 6 Deleted User 2010-11-18 17:05:27 UTC
*** Bug 20502 has been marked as a duplicate of this bug. ***
Comment 7 Deleted User 2010-11-18 17:06:07 UTC
*** Bug 20512 has been marked as a duplicate of this bug. ***
Comment 8 Deleted User 2010-11-18 17:06:58 UTC
*** Bug 20522 has been marked as a duplicate of this bug. ***
Comment 9 Deleted User 2010-11-18 17:08:52 UTC
*** Bug 20532 has been marked as a duplicate of this bug. ***
Comment 10 Deleted User 2010-11-18 17:09:40 UTC
*** Bug 20542 has been marked as a duplicate of this bug. ***
Comment 11 Deleted User 2010-11-18 17:11:32 UTC
*** Bug 20552 has been marked as a duplicate of this bug. ***
Comment 12 Deleted User 2010-11-18 17:12:12 UTC
*** Bug 20562 has been marked as a duplicate of this bug. ***
Comment 13 Deleted User 2010-11-18 17:12:51 UTC
*** Bug 20572 has been marked as a duplicate of this bug. ***
Comment 14 Deleted User 2010-11-18 17:13:20 UTC
*** Bug 20582 has been marked as a duplicate of this bug. ***
Comment 15 Deleted User 2010-11-18 17:14:37 UTC
(In reply to comment #4)
> Bugs 20492, 20502, 20512, 20522, 20532, 20542, 20552, 20562, 20572, and 20582
> are all (I believe) duplicates of this one, but I don't see how to mark them
> that way.

I have done it. (it is huge! ;-) )
Comment 16 kernelbugs 2010-11-18 17:49:02 UTC
I believe the duplicates came when I submitted the bug; I kept trying and getting an SQL error page from Bugzilla, but it must have filed them anyway. Oops!

I will also note that Dave Miller committed the fix above as:

http://git.kernel.org/linus/b336369c1e1ad88495895260a9068eb18bc48b6c

and it is in v2.6.37-rc1.