Bug 3156

Summary: (net de2104x) Kernel panic with de2104x tulip driver on boot
Product: Drivers Reporter: Spauldo da Hippie (spauldo)
Component: NetworkAssignee: Grant Grundler (grundler)
Status: CLOSED CODE_FIX    
Severity: high CC: grundler, kyle, protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.7 stock kernel Subsystem:
Regression: --- Bisected commit-id:

Description Spauldo da Hippie 2004-08-04 12:44:28 UTC
Distribution: Debian unstable
Hardware Environment:
  Pentium 75
  Linksys LNE100TX 10/100 Network Adapter
  Zynx ZX312(?) 10Mbit Network Adapter
    Chip: Digital 21040-AA (other numbers: DC1012B, 21-40593-03)
    TP, BNC, AUI Connectors
  Circus Logic GD5426 Video Adapter
  Intel PCIset Chipset (hard to see in there)
  Intel PIIX IDE Controller (onboard)
Software Environment:
  Debian Unstable (Late July)
  IPTables Modules (ipv4, ipv6)
  Hotplug subsystem
Problem Description:
  On boot, with both network cards in, the system detects both network
  cards with hotplug.  On debian, the network starts right after hotplug,
  and just a few seconds after the network is brought up, I either get a 
  kernel panic with the de2104x.c (see below) or one involving the
  swapper (which I haven't got a copy of).  With the zynx card removed,
  the system boots fine.
  Also, strangely enough (and agreeing with a similar report from the lkml),
  if the zynx card has a link (i.e. network cable plugged in and linking),
  it doesn't panic.  My current setup doesn't really allow me to test if it
  actually works, however.  I also can't test the AUI or BNC interfaces.

  Panic Info (copied by hand, some 0's might be 8's - the monitor was blurry):

eth1: timeout expired stopping DMA
-----------------[ cut here ]-----------------
kernel BUG at drivers/net/tulip/de2104x.c:919!
invalid operand: 0000 [#1]
PREEMPT
Modules linked in: ipv6 de2104x tulip
CPU:    0
EIP     0060:[<c4819ede>]    Not tainted
EFLAGS: 00010206   (2.6.7)
EIP is at de_set_media+0x1e/0x130 [de2104x]
eax: fc200100   ebx: 00000292   ecx: c028c6b0   edx: c4811800
esi: c3f3a220   edi: 00000002   ebp: fffc0000   esp: c0314f8c
ds: 007b   es: 007b   ss: 0068
Process net.agent (pid: 1016, threadinfo=c0314000 task=c3f6c5f0)
Stack: 00000292 c0314000 c3f3a220 ffffffc6 c481a199 c3f3a220 c3f3a000 00000002
       c3a22480 c3f3a220 c481a040 c0314000 c0314fcc c011918e c3f3a220 c0314fcc
       c0314fcc c0314fcc 00000000 00000011 c031d208 0000000a 00000000 c0115723
Call Trace:
Stack pointer is garbage, not printing trace
Code: 0f 0b 97 03 ac b6 81 c4 f6 86 a0 05 00 00 01 74 0a c7 42 58
 <0>Kernel Panic: fatal exception in interrupt
In interrupt handler - not syncing

Steps to reproduce:
  Boot with a the card above or possibly one with the same chipset without
  plugging in the cable.  Bring up the interface, wait a few seconds.
Comment 1 Spauldo da Hippie 2004-08-05 19:23:31 UTC
Sorry, note that when I say "stock kernel", I mean stock linus kernel, not
debian kernel.  Compiled with gcc 3.3.
Comment 2 Natalie Protasevich 2007-09-22 18:45:22 UTC
Spauldo, since it's been a while can we refresh the status. How is it working with recent kernels?
Thanks.
Comment 3 Grant Grundler 2007-11-02 23:28:42 UTC
This bug is missing a stack trace. Don't know if that kernel can't provide one or the BUG() just prints the message and no stack trace but continues running.

But looking at 2.6.7 tree, a least one bug is obvious to me:
   "eth1: timeout expired stopping DMA" message is from de_stop_rxtx().

de_stop_rxtx() is supposed to wait until MAC TX/RX engines have gone to a "stopped" state. This can take up to 1200us at 10BT link rate. The "work" loop has no udelay() and is only testing 1000 times. Given each MMIO read is 1 microsecond about, that means code is only waiting about 1000us total AND disrupting the DMA stream while polling. I can fix that in the current code with this:
    http://iou.parisc-linux.org/~grundler/diff/diff-2.6.23-de2104x_stop_rxtx-01


The next message, "BUG in 919" is just a result of the TX/RX still running. This could happen during shutdown as well if "large" (more than 1K bytes) TX frames are in flight. Ie can be a problem in cases other than "No Link".
Comment 4 Natalie Protasevich 2008-03-03 21:50:44 UTC
Since reporter is not showing up, it's up to you Grant to decide whether to close the bug and to submit the patch.
Comment 5 Grant Grundler 2008-03-05 08:27:18 UTC
Part of the "problem" was fixed here:
http://www.mail-archive.com/netdev@vger.kernel.org/msg62633.html

(replace BUG_ON with printk warning).

I'll submit the diff-2.6.23-de2104x_stop_rxtx-01 patch today. (Comment #3)
Once it's accepted I'll close the bug.

thanks for the reminder,
grant