Bug 7116

Summary: pcnet_cs no longer works after linux-2.6.16-git20 (Too much work at interrupt)
Product: Drivers Reporter: Ryan Underwood (nemesis)
Component: PCMCIAAssignee: linux-pcmcia
Status: CLOSED INSUFFICIENT_DATA    
Severity: normal CC: alan, daniel.ritz, komurojun-mbn, linux
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.20, 2.6.17-rc1, 2.6.17,2.6.18 Subsystem:
Regression: No Bisected commit-id:
Attachments: config.opts
dmesg
dmesg
lspci
lspci
the bootup log of 2.6.18
test patch
pcmcia-fixes-2.6.git
lspcmcia
dmesg 2.6.19-rc5
dmesg 2.6.25.8
lspci 2.6.19-rc5
lspci 2.6.25.8
dmesg with pcmcia debug
lspcmcia 2.6.19-rc5
lspcmcia 2.6.25.8
dmesg with modules pre-inserted
lspcmcia with modules pre-inserted

Description Ryan Underwood 2006-09-06 14:53:20 UTC
Most recent kernel where this bug did not occur: 2.6.16-git20
Distribution: Debian
Hardware Environment:  Toshiba 500CDT, ToPIC95-B, Ibm H&A 10mbps Ethernet/modem
Software Environment:  Problem is present in 2.6.17-rc1, 2.6.17, and git current
Problem Description:

I have this pcnet_cs card which has always worked no problem up to
2.6.16.28.  I just tried to upgrade to 2.6.17, and the kernel log is
flooded with

eth0: Too much work at interrupt, status 0x22

along with the occasional

eth0: pcnet_reset_8390() did not complete.

and no network traffic works.  I tried removing and reinserting the card
and driver with no improvement.

I see where this problem occurs in 8390.c but am baffled since there were no 
changes to that file.  There were changes in pcnet_cs.c but mostly API updates 
from what I can tell.

Steps to reproduce:
Insert and attempt to use the aforementioned pcnet_cs card with any 
distribution after linux-2.6.16-git20.
I tried disable_clkrun option of yenta_socket with no effect.
Comment 1 Ryan Underwood 2006-09-06 17:37:34 UTC
http://pcmcia-cs.sourceforge.net/ftp/doc/PCMCIA-HOWTO-6.html
Where can I find the dump_cis and dump_cisreg equivalents for the new PCMCIA 
subsystem?
Comment 2 Daniel Ritz 2006-09-23 10:23:41 UTC
tried that with 2.6.18 on and toshiba 530CDT (should be pretty much the same HW)
and a Natsemi network card (also pcnet_cs driver). it worked just fine. also a
xircom card works w/o problems. so could you do try 2.6.18 and if not working
give the output of dmesg, lspci -vvv from a working and a non-working kernel.
also post a copy of your config.opts.

dump_cis and cbdump can be built using "make debugtools". no dump_cisreg...
Comment 3 Ryan Underwood 2006-09-25 08:11:07 UTC
Created attachment 9092 [details]
config.opts
Comment 4 Ryan Underwood 2006-09-25 08:11:33 UTC
Created attachment 9093 [details]
dmesg
Comment 5 Ryan Underwood 2006-09-25 08:11:52 UTC
Created attachment 9094 [details]
dmesg
Comment 6 Ryan Underwood 2006-09-25 08:12:17 UTC
Created attachment 9095 [details]
lspci
Comment 7 Ryan Underwood 2006-09-25 08:12:32 UTC
Created attachment 9096 [details]
lspci
Comment 8 Ryan Underwood 2006-09-25 08:15:58 UTC
Created attachment 9097 [details]
the bootup log of 2.6.18
Comment 9 Ryan Underwood 2006-09-25 08:18:28 UTC
Maybe it has to do with the "driver needs updating to support shared IRQ" in 
dmesg?
Comment 10 Daniel Ritz 2006-10-01 06:21:46 UTC
looking at the files it's not obvious what's wrong. i also tried your
config.opts on my tecra530, but it works. also looking at pcnet_cs and pcmcia
changes there's nothing obvious. i think it's about resource assignments. cbdump
with the card inserted from both kernels could probably tell. if they're
different you have to play with those:
  include memory 0xc0000-0xfffff
  include memory 0xa0000000-0xa0ffffff
  include memory 0x60000000-0x60ffffff
  include memory 0x10000000-0x1fffffff
one of those line might contain resources that should not be used...

and if all that does not help: "git bisect" will tell what change broke your box...
Comment 11 Ryan Underwood 2006-10-03 13:23:22 UTC
Down to 5 changes on git bisect, some other interesting message I saw:
eth0: mismatched read page pointers  1 vs 3f.
This message alternates with the too much work at interrupt message on one 
particular revision.
Comment 12 Ryan Underwood 2006-10-03 14:11:43 UTC
The problem is introduced in one of the following patches:
dbb22f0d65ccc2e9dfeb4c420942f2757a80f8d2    [PATCH] pcmcia: access config_t 
using pointer instead of array
855cdf134dfcf2ecb92ac4ad675cf655d8ceb678    [PATCH] pcmcia: always use device 
pointer to config_t
360b65b95bae96f854a2413093ee9b79c31203ae    [PATCH] pcmcia: make config_t 
independent, add reference counting

Very strange.  I'll find the offending patch tomorrow since I'm out of time.
Comment 13 Ryan Underwood 2006-10-04 07:43:29 UTC
$ git bisect good
360b65b95bae96f854a2413093ee9b79c31203ae is first bad commit
commit 360b65b95bae96f854a2413093ee9b79c31203ae
Author: Dominik Brodowski <linux@dominikbrodowski.net>
Date:   Tue Jan 10 20:50:39 2006 +0100

    [PATCH] pcmcia: make config_t independent, add reference counting

    Handle config_t structs independent of struct pcmcia_socket, and add
    reference counting for them.

    Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>

:040000 040000 967add784fbb04f2af52f2e0f5e942284e95bbff 
704326f0e53a1f83aae4403aac771357943c9cf2 M      drivers
:040000 040000 0be81163641a820bba7da5b68713c23fbe6f3d26 
582e25d839c2abd4be56ffa38932a9a7188b6a69 M      include
Comment 14 Daniel Ritz 2006-10-05 12:19:50 UTC
Created attachment 9164 [details]
test patch

does the test patch make any difference?
Comment 15 Ryan Underwood 2006-10-06 09:11:59 UTC
Same problem exists (now using HEAD)
Comment 16 Ryan Underwood 2006-10-09 14:33:01 UTC
Any ideas?  I have no clue what's going on with that driver.
Comment 17 Ryan Underwood 2006-10-27 14:02:56 UTC
Do you even have anything to suspect that I can play around with?
Comment 18 Daniel Ritz 2006-10-28 13:33:53 UTC
sorry, i can't see what's wrong. cc'ing Dominik as he might have a better idea.

anyway, could you provide a 'cbdump' output with the card inserted for a working
and a non-working kernel? 
Comment 19 Dominik Brodowski 2006-11-07 19:11:48 UTC
Created attachment 9432 [details]
pcmcia-fixes-2.6.git

Could you test 2.6.19-rc5 with this patch, please?
Comment 20 Ryan Underwood 2006-11-07 19:17:48 UTC
I only have access to -rc4 in git, testing that now.
Comment 21 Ryan Underwood 2006-11-08 07:09:04 UTC
Works now!  So what was the problem?
Comment 22 Dominik Brodowski 2006-11-08 14:19:33 UTC
I'm not really sure :) Could you post the output of "lspcmcia -vvv", please?
Comment 23 Ryan Underwood 2006-11-09 09:53:10 UTC
Created attachment 9445 [details]
lspcmcia

Here you go
Comment 24 Dominik Brodowski 2006-11-12 15:44:57 UTC
Thanks -- the bugfix is that the resource management code of the PCMCIA core was
informed too late that this is a multifunction card.
Comment 25 Ryan Underwood 2008-06-23 21:21:12 UTC
I've upgraded to 2.6.25.8 with the same config (make oldconfig) and this problem is back.
Comment 26 Dominik Brodowski 2008-06-24 00:36:24 UTC
What was the last kernel version which worked for you? Also, could you post a dmesg and "lspci -vvv" output of a working and non-working kernel, please?
Comment 27 Ryan Underwood 2008-06-24 22:20:52 UTC
I haven't bisected since I don't have the time but I am using 2.6.19-rc5 which seems to work.  Attachments follow
Comment 28 Ryan Underwood 2008-06-24 22:22:38 UTC
Created attachment 16609 [details]
dmesg 2.6.19-rc5

Not too useful because of kobject debug spam.  Let me know if you need me to rebuild 2.6.19-rc5 with a bigger dmesg buffer.
Comment 29 Ryan Underwood 2008-06-24 22:23:24 UTC
Created attachment 16610 [details]
dmesg 2.6.25.8
Comment 30 Ryan Underwood 2008-06-24 22:23:50 UTC
Created attachment 16611 [details]
lspci 2.6.19-rc5
Comment 31 Ryan Underwood 2008-06-24 22:24:04 UTC
Created attachment 16612 [details]
lspci 2.6.25.8
Comment 32 Dominik Brodowski 2008-06-25 00:07:48 UTC
I'm a bit confused about this one... are "pcnet_cs" and "serial_cs" built in, or is just one of those a module which gets loaded afterwards? It shouldn't matter, but maybe this card needs pcnet_cs to be up and running before serial_cs...

Also, could you post a "lspcmcia -vvv" (sorry for the typo yesterday) for the new kernel? (we have one for a working one already). Also, so far there's no need to bisect. What might be useful, though, is enabling PCMCIA_DEBUG and setting the module parameter "debug" of the module pcmcia to 10 (if it's built-in, add "pcmcia.debug=10" to the kernel command line, if it's a module, "modprobe pcmcia debug=10").
Comment 33 Ryan Underwood 2008-06-25 08:42:05 UTC
pcnet_cs and serial_cs are modules.  I will enable PCMCIA debug and post back.
Comment 34 Anonymous Emailer 2008-06-25 09:04:38 UTC
Reply-To: linux@dominikbrodowski.net

Could you try modprobing these modules first, before inserting the card?
Comment 35 Ryan Underwood 2008-06-25 17:05:08 UTC
Created attachment 16626 [details]
dmesg with pcmcia debug
Comment 36 Ryan Underwood 2008-06-25 17:06:02 UTC
Created attachment 16627 [details]
lspcmcia 2.6.19-rc5
Comment 37 Ryan Underwood 2008-06-25 17:06:20 UTC
Created attachment 16628 [details]
lspcmcia 2.6.25.8
Comment 38 Ryan Underwood 2008-06-25 17:18:14 UTC
Created attachment 16629 [details]
dmesg with modules pre-inserted
Comment 39 Ryan Underwood 2008-06-25 17:18:34 UTC
Created attachment 16630 [details]
lspcmcia with modules pre-inserted
Comment 40 Dominik Brodowski 2008-07-10 11:26:53 UTC
Hm, that's a tough one. Could you try out kernel 2.6.20 and check whether that works?
Comment 41 Ryan Underwood 2008-07-17 16:15:50 UTC
2.6.20 has this problem also.
Comment 42 Ryan Underwood 2008-11-05 20:11:57 UTC
And still no fix in 2.6.27.4.
Comment 43 Ryan Underwood 2008-11-05 20:41:27 UTC
Finally, some illumination.  I have some more of these IBM H&A credit cards.  There seems to have been a firmware update (HAWIN95.EXE) available at some point.  This firmware update seems to change the prodid string from "Home and Away Credit Card Adapter" to "w95 Home and Away Credit Card ".  It is the "w95" version which does not function with Linux.  The card without the "w95" firmware works perfectly.  So now where to look?
Comment 44 Ryan Underwood 2008-11-05 21:49:56 UTC
I did the obvious thing and flashed the "w95" card back to the original firmware.

Now it works.

Apparently, whatever hack is done by IBM to support Win95 renders the card incompatible with later versions of Linux.

Both the Win95 and original firmware work correctly with Linux prior to some change in 2.6.16.28.
Comment 45 Ryan Underwood 2008-11-05 21:51:43 UTC
It seems that the IBM Win95 kludge has to do with how the separate devices on the multifunction card are enumerated.  Perhaps this is why the above patch worked for the time (and then broke again later).
Comment 46 Komuro 2009-06-20 15:07:37 UTC
Hi,

Does this problem still exist?

Could you try kernel 2.6.30-git15 or newer?
Comment 47 Dominik Brodowski 2010-03-06 14:34:11 UTC
Even better would be trying out 2.6.34-rc1, once it gets out in a few days...