Bug 15091

Summary: starfire causes kernel BUG when interface goes up
Product: Drivers Reporter: Michael Moffatt (michael)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: RESOLVED CODE_FIX    
Severity: normal CC: akpm, alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: ls -l /dev (before crash)
dmesg (post crash)
lsmod (post crash)
lspci (before crash)
/var/log/messages (post crash)
/var/log/syslog (post crash)

Description Michael Moffatt 2010-01-20 04:29:18 UTC
Created attachment 24651 [details]
ls -l /dev (before crash)

I formerly used 2.6.20 and 2.6.24 with a couple of starfire 4 port ethernet cards. On 2.6.32 the interfaces don't start on boot and when I issue "ifconfig ethX up" (where X is a starfire port).

Sometimes the exception causes the whole kernel to freeze. Sometimes the kernel keeps going. On the occasion that the kernel kept going I was able to retrieve syslog, which has the full kernel information.

Note that in syslog, you can see that I inserted a USB memory stick in order to copy off the attached files. The kernel oops happens without the USB memory stick inserted.

I can reproduce this at will. At the moment I simply can't use my two four port starfire network cards.

This PC is a root-over-NFS system.
Comment 1 Michael Moffatt 2010-01-20 04:29:51 UTC
Created attachment 24652 [details]
dmesg (post crash)
Comment 2 Michael Moffatt 2010-01-20 04:30:31 UTC
Created attachment 24653 [details]
lsmod (post crash)
Comment 3 Michael Moffatt 2010-01-20 04:31:06 UTC
Created attachment 24654 [details]
lspci (before crash)
Comment 4 Michael Moffatt 2010-01-20 04:31:37 UTC
Created attachment 24655 [details]
/var/log/messages (post crash)
Comment 5 Michael Moffatt 2010-01-20 04:32:05 UTC
Created attachment 24656 [details]
/var/log/syslog (post crash)
Comment 6 Michael Moffatt 2010-01-20 04:34:18 UTC
That first paragraph should read:

I formerly used 2.6.20 and 2.6.24 with a couple of starfire 4 port ethernet
cards. On 2.6.32 the interfaces don't start on boot and when I issue "ifconfig
ethX up" (where X is a starfire port) *there is an exception*.
Comment 7 Alan 2010-01-25 13:47:40 UTC
Any chance you can try a few kernels in between to see where it broke, at that point we can try and narrow down which change caused the problem better
Comment 8 Andrew Morton 2010-01-26 01:08:40 UTC
marked as a regression.
Comment 9 Andrew Morton 2010-01-26 01:09:08 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 20 Jan 2010 04:29:20 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=15091
> 
>            Summary: starfire causes kernel BUG when interface goes up
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.32
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: michael@moffatt.org.nz
>         Regression: No
> 
> 
> Created an attachment (id=24651)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=24651)
> ls -l /dev (before crash)
> 
> I formerly used 2.6.20 and 2.6.24 with a couple of starfire 4 port ethernet
> cards. On 2.6.32 the interfaces don't start on boot and when I issue
> "ifconfig
> ethX up" (where X is a starfire port).
> 
> Sometimes the exception causes the whole kernel to freeze. Sometimes the
> kernel
> keeps going. On the occasion that the kernel kept going I was able to
> retrieve
> syslog, which has the full kernel information.
> 
> Note that in syslog, you can see that I inserted a USB memory stick in order
> to
> copy off the attached files. The kernel oops happens without the USB memory
> stick inserted.
> 
> I can reproduce this at will. At the moment I simply can't use my two four
> port
> starfire network cards.
> 
> This PC is a root-over-NFS system.
> 

Starfire is triggering the BUG_ON(!test_bit(NAPI_STATE_SCHED,
&n->state)); in napi_enable().

This is a regression somewhere between 2.6.24 and 2.6.32(!).
Comment 10 Michael Moffatt 2010-01-26 01:44:39 UTC
Hi Andrew,

I believe that this is a regression, yes.

I will attempt to compile up some kernels this week and provide more 
info. Should I start at 26 and go up or at 31 and go down?

I can't use anything lower than 26 according to udev. I was running 24 
but compiled 32 when I upgraded udev.

Regards,
Michael.

Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Wed, 20 Jan 2010 04:29:20 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>   
>> http://bugzilla.kernel.org/show_bug.cgi?id=15091
>>
>>            Summary: starfire causes kernel BUG when interface goes up
>>            Product: Drivers
>>            Version: 2.5
>>     Kernel Version: 2.6.32
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Network
>>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>>         ReportedBy: michael@moffatt.org.nz
>>         Regression: No
>>
>>
>> Created an attachment (id=24651)
>>  --> (http://bugzilla.kernel.org/attachment.cgi?id=24651)
>> ls -l /dev (before crash)
>>
>> I formerly used 2.6.20 and 2.6.24 with a couple of starfire 4 port ethernet
>> cards. On 2.6.32 the interfaces don't start on boot and when I issue
>> "ifconfig
>> ethX up" (where X is a starfire port).
>>
>> Sometimes the exception causes the whole kernel to freeze. Sometimes the
>> kernel
>> keeps going. On the occasion that the kernel kept going I was able to
>> retrieve
>> syslog, which has the full kernel information.
>>
>> Note that in syslog, you can see that I inserted a USB memory stick in order
>> to
>> copy off the attached files. The kernel oops happens without the USB memory
>> stick inserted.
>>
>> I can reproduce this at will. At the moment I simply can't use my two four
>> port
>> starfire network cards.
>>
>> This PC is a root-over-NFS system.
>>
>>     
>
> Starfire is triggering the BUG_ON(!test_bit(NAPI_STATE_SCHED,
> &n->state)); in napi_enable().
>
> This is a regression somewhere between 2.6.24 and 2.6.32(!).
>
>
>
Comment 11 Andrew Morton 2010-01-26 01:51:45 UTC
On Tue, 26 Jan 2010 14:44:31 +1300 Michael <michael@moffatt.org.nz> wrote:

> Hi Andrew,
> 
> I believe that this is a regression, yes.
> 
> I will attempt to compile up some kernels this week and provide more 
> info. Should I start at 26 and go up or at 31 and go down?
> 
> I can't use anything lower than 26 according to udev. I was running 24 
> but compiled 32 when I upgraded udev.
> 

Thanks.

Starfire is a pretty rarely-used driver, I suspect.  Hopefully someone
who understands the NAPI stuff can look at the code and go "ah-hah",
and save you all that work.

But if that doesn't happen then yup, a bisection would be good, thanks.
 The best way to do it really is with git. 
http://landley.net/writing/git-quick.html has an explanation.
Comment 12 Andrew Morton 2010-01-26 02:15:47 UTC
Added starfire-clean-up-properly-if-firmware-loading-fails.patch to -mm.