Bug 11067 - System Hangs while loading ISICOM module...
Summary: System Hangs while loading ISICOM module...
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 high
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-10 05:31 UTC by Prabhat Gupta
Modified: 2012-05-22 12:47 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.18
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Char: isicom, enable/disable pci device (1.28 KB, patch)
2008-07-10 07:23 UTC, Jiri Slaby
Details | Diff
Driver source which I am using. (48.25 KB, text/plain)
2008-07-11 04:30 UTC, Prabhat Gupta
Details
Utility for loading the firmware for ISI card... (3.11 KB, text/plain)
2008-07-21 04:07 UTC, Prabhat Gupta
Details
Log for ISICOM problem... (719.58 KB, text/plain)
2009-07-14 09:01 UTC, Prabhat Gupta
Details

Description Prabhat Gupta 2008-07-10 05:31:08 UTC
Latest working kernel version: 2.6.18
Earliest failing kernel version: 2.6.18
Distribution: 2.6.18-53.el5
Hardware Environment: x86_64
Software Environment: Enterprise Linux
Problem Description:

My system configuration is x86_64 GNU/Linux with kernel 2.6.18-53.el5. I have a
Multi-tech ISI card inserted into one of the PCI slots on my PC. I compile the driver (isicom.c available with the kernel source) and try to load it using insmod. The system hangs after insmod. I added some prints and found that the system hanged in the following line in the function :

static int __devinit reset_card() when the card is being given a reset by writing 0x0 i.e. in the line  outw(0, base + 0x8); /* Reset */. I am not able to find a solution to it. When I comment this line, the system doesn't hang but the driver loading fails in static int __devinit load_firmware() by giving warning "Card1 rejected load header, Address:0x8000, Count:0x10, Status:0xdd"

Any suggestion would be of immense help to me. Thanks in advance.

Regards,
Prabhat
Comment 1 Andrew Morton 2008-07-10 06:08:18 UTC
erk, nobody really maintains isicom.

2.6.18 is terribly old - is it possible for you to test something more recent?
Comment 2 Jiri Slaby 2008-07-10 07:02:30 UTC
Beside the fact the device is not enabled (and it never was), I can't see no problem. The path in fact haven't changed since 2.3.13pre7 where the pci support was added except one point -- there might be seconds before reset was called from external program by ioctl, now it's called immediately.

Just to be sure if you stick msleep(10000); to the beginning of the reset_card does anything change?

What kernel version worked for you?
Comment 3 Jiri Slaby 2008-07-10 07:23:25 UTC
Created attachment 16786 [details]
Char: isicom, enable/disable pci device

You can also try this patch if it helps for any chance.
Comment 4 Prabhat Gupta 2008-07-11 03:15:48 UTC
Dear Jiri,

I tried the changes that you suggested (msleep and enable/disable pci device), but the results are the same. One thing interesting (by adding msleep after outw) I noted is that the kernel hangs not in outw but whenever outw is done to reset the card and after that anything is read (like presently signature is being read), the system hangs.

The card is working fine with 2.4 kernel.
Comment 5 Prabhat Gupta 2008-07-11 03:33:45 UTC
The prints of var/log/messages is given below. Thought it might be of some help. Some prints have been added by me.

Jul 11 16:08:53 lenlin44 kernel: isicom 0000:11:0a.0: Signature : 0xdd
Jul 11 16:08:53 lenlin44 kernel: isicom 0000:11:0a.0: -Done
Jul 11 16:08:53 lenlin44 kernel: isicom 0000:11:0a.0: After reset_card returned
Jul 11 16:08:53 lenlin44 kernel: isicom 0000:11:0a.0: Name : isi4608.bin
Jul 11 16:08:53 lenlin44 kernel: isicom 0000:11:0a.0: After request_firmware
Jul 11 16:08:53 lenlin44 kernel: isicom 0000:11:0a.0: Card1 rejected load header:
Jul 11 16:08:53 lenlin44 kernel: Address:0x8000
Jul 11 16:08:53 lenlin44 kernel: Count:0x10
Jul 11 16:08:53 lenlin44 kernel: Status:0xdd
Jul 11 16:08:53 lenlin44 kernel: isicom 0000:11:0a.0: After load_firmware
Comment 6 Prabhat Gupta 2008-07-11 03:35:25 UTC
The prints given in my previous comment only come when I comment out the line of outw meant to Reset the card in reset_card(). Else the system hangs.
Comment 7 Prabhat Gupta 2008-07-11 04:30:04 UTC
Created attachment 16796 [details]
Driver source which I am using.

Attached with the mail is the isicom.c driver source which I am using.
Comment 8 Jiri Slaby 2008-07-11 06:58:52 UTC
Ok, thanks for the info. Could you be more specific on what 2.4 kernel worked for you? And also what userspace applications you used to use to boot the card. Maybe I'm looking at wrong code paths.

BTW. "Status:0xdd" could mean that the card returns still the signature. It's the same register (as the status). I think this might be caused by some lockup-ness in one state of the card or whatever caused by not-resetting the card.
Comment 9 Jesse Barnes 2008-07-11 08:48:38 UTC
It's fairly common for drivers to have to wait a given number of PCI clock cycles after resetting a device before accessing registers again.  The behavior you're seeing does sound similar to what we'd see if there was no delay following the reset command, but if you already tried the msleep() there must be some other device specific issue here.

Hm, looking at the driver it appears we already sleep for a second after the reset, which *should* be plenty of time too...

Interestingly, there was an apparently untested change, 07fb6f26bab869fc3bb9df0a785ba734f4c51ac3, added to reset_card:

diff --git a/drivers/char/isicom.c b/drivers/char/isicom.c
index 85d596a..eba2883 100644
--- a/drivers/char/isicom.c
+++ b/drivers/char/isicom.c
@@ -1527,7 +1527,7 @@ static int __devinit reset_card(struct pci_dev *pdev,
        msleep(10);

        portcount = inw(base + 0x2);
-       if (!inw(base + 0xe) & 0x1 || (portcount != 0 && portcount != 4 &&
+       if (!(inw(base + 0xe) & 0x1) || (portcount != 0 && portcount != 4 &&
                                portcount != 8 && portcount != 16)) {
                dev_err(&pdev->dev, "ISILoad:PCI Card%d reset failure.\n",
                        card + 1);

Maybe it's to blame, though it looks more sensible that what was there previously, and only affects an error case?

Is the config space for the device still accessible after the reset?  If so, you could check to make sure the device hasn't lost its BAR assignments, since that would definitely cause trouble.  But since it looks like the driver never re-loaded the BARs in the past, that's probably not the problem...
Comment 10 Prabhat Gupta 2008-07-12 00:01:59 UTC
The system details (by uname -a) on which the same card is working fine is :

Linux localhost.localdomain 2.4.20-8 #1 Thu Mar 13 17:54:28 EST 2003 i686 i686 i386 GNU/Linux. I am also in the impression that the status field showing 0xdd may be the signature itself. I think that the status field being read and expected to be zero is initialized to zero by the Reset outw only. And since I have commented this line (else the system hangs), this register is not being initialized and hence it isn't zero as assumed. Somehow it's happening that the registers become unreadable bu Resetting and hence hanging the system when an attempt is made to read those registers (just after Reset).

Kindly suggest something as I have already spent more than a week [ :-( ] debugging this thing. Thanks to you all in advance.
Comment 11 Jiri Slaby 2008-07-12 00:48:01 UTC
Prabhat: And the utility for uploading the firmware?

Also could you add printk(KERN_DEBUG "%lx\n", pci_resource_start(pdev, 3));
before and after the Reset outw(); (I hope I understood Jesse correctly)?

Also I can't see a problem in irq handler vs. init unless you unbind the device and bind it back -- status flag is not cleared -- but if you comment out request_irq(); or add return IRQ_NONE; into isicom_interrupt(); unconditionally, does it change anything (some kind of spurious interrupt, disabling message instead of hang)?

BTW. what kind of hang are we talking about? Is this machine locked up due to some stuck bridge or whatsoever or oops -> panic?

Jesse: (I see it as 07fb6f26 and) it is 2.6.26 material, see Prabhat's version of the driver in the attachement, it's from 2.6.18.
Comment 12 Prabhat Gupta 2008-07-12 01:56:19 UTC
Dear Jiri,

Which utility are you talking about. I am not using any external userspace utility for loading the firmware. Is that required ? As far as I understand, firmware is being loaded by static int __devinit load_firmware().

For my system, the indications are nothing responding, no mouse, no keyboard works and I am not left with any option other than to give a hard boot. The next time when the system comes up, the /var/log/messages doesn't have any kernel crash prints as such except the prints that I add in the isicom driver.

As per Jesse's suggestion, I added the prints and am giving for your reference :

Jul 12 14:21:28 lenlin44 kernel: isicom 0000:11:0a.0: ISI PCI Card(Device ID 0x2028)
Jul 12 14:21:38 lenlin44 kernel: isicom 0000:11:0a.0: Before reset
Jul 12 14:21:38 lenlin44 kernel: isicom 0000:11:0a.0: ioaddr before reset : 0x2080
Jul 12 14:21:38 lenlin44 kernel: isicom 0000:11:0a.0: ioaddr after reset : 0x2080
Jul 12 14:21:38 lenlin44 kernel: isicom 0000:11:0a.0: After reset

After the last print rest of the prints don't appear as the system was already hung (I uncommented that outw Reset line while adding the prints. I guess that was the purpose of Jesse also).

I am not sure, but can it be an issue of 32-bit and 64-bit architecture. May be outw for 64-bit is writing out-of-bounds i.e. on some invalid address (on the PCI bus while the ISI card can't handle that much). It's just a wild guess from my side. So please excuse me if I am wrong :)
Comment 13 Jiri Slaby 2008-07-12 14:30:33 UTC
Are you testing this in console? Do you have kernel messages console (kmsg_redirect) set to the console you are testing on (i.e. do you see any messages from the kernel on the console) -- if it panics, you should see the panic message. I'm not sure when this was added, don't your caps and scroll locks blink when the lockup happen?

What happens when you disable the irq as mentioned earlier?

I suppose the device is behind some 32-bit bridge and the device can handle byte-enables of 16-bit transfers on 32-bit bus just fine, if it worked before. Hrm. Have you changed also a platform between 2.4 and 2.6?
Comment 14 Jiri Slaby 2008-07-13 04:12:47 UTC
Ah, and the utility. I meant which you used to use in the 2.4 kernel.
Comment 15 Prabhat Gupta 2008-07-14 04:51:12 UTC
Hello Jiri,

Yes I am testing this in console, using insmod to insert the module from a terminal. I have never used kmsg_redirect. But I hope that the thing used to do it is like given below :

#define TIOCL_SETKMSGREDIRECT   11      // from <linux/tiocl.h>

int main( int argc, char **argv )
{
        char bytes[2] = {TIOCL_SETKMSGREDIRECT, 0};
        if (argc == 2 ) bytes[1] = atoi( argv[1]);      // console id-number
        else
        {
                fprintf( stderr, "%s: need a single argument\n", argv[0] );
                exit(1);
        }

        int fd = open( "/dev/console", O_RDWR );
        if (fd < 0)
        {
                perror( "/dev/console" ); exit(1); }

        if ( ioctl( fd, TIOCLINUX, bytes ) < 0 )
                {
                fprintf( stderr, "%s: ioctl( fd, TIOCLINUS, ...):", argv[0] );
                fprintf( stderr, " %s\n", strerror( errno ) );
                exit(1);
                }

        exit(0);
}

But I am not sure whether this can be used from some other program to set the flag or this code should be added in isicom.c. Please suggest. Presently, no kernel panic messages appear on the console. Nothing happens, just the whole system hangs. But I guess, even if the kernel panic messages appear, they should be present in /var/log/messages when the system comes up the next time, but /var/log/messages only contains the debugging prints that I add. The CAPS /SCROLL lock keys also don't respond.

By disabling the irq, probably you are suggesting me to comment out the function call isicom_register_isr(). I did that but the behavior remains the same.

With 2.4, the standard package distribution (by MultiTech) included the program to load the firmware.
Comment 16 Prabhat Gupta 2008-07-14 04:53:09 UTC
Hi again Jiri,

Yes, the hardware architecture was also changed when I moved on from 2.4 kernel to 2.6 kernel. I think that should be clear from my original posts also. 2.6 is on x86_64 (64-bit), while 2.4 was i386 (32-bit).
Comment 17 Prabhat Gupta 2008-07-15 23:29:51 UTC
I am not getting any replies. Am eagerly waiting for someone to suggest something. Thanks in advance.
Comment 18 Jiri Slaby 2008-07-18 14:07:26 UTC
> But I guess, even if the kernel panic messages appear, they should be present
> in /var/log/messages

Wrong guess. If the kernel panics, it won't most likely store anything to disk.

> comment out the function call isicom_register_isr(). 

Exactly.

> standard package distribution (by MultiTech)

Do you have sources of that? Which version was that?

I suggest you to test latest kernel. This seems like not being a driver problem.
Comment 19 Prabhat Gupta 2008-07-21 04:05:53 UTC
Hi Jiri,

"Do you have the sources of that" -----> Yes. I am attaching the source of the utility for your reference.

Thanks for your suggestion. I will try testing the thing with the latest kernel.

Regards,

Prabhat
Comment 20 Prabhat Gupta 2008-07-21 04:07:57 UTC
Created attachment 16917 [details]
Utility for loading the firmware for ISI card...
Comment 21 Prabhat Gupta 2008-08-13 03:07:19 UTC
Hi,

The isicom module gets loaded successfully with the new kernel (2.6.26) on my system. But now I am facing a new problem. When I open /dev/ttyM1a in minicom, the system hangs. What can be possible issue. Thanks in advance.

Regards,

Prabhat
Comment 22 Jiri Slaby 2008-08-14 07:55:15 UTC
Could you grab console output?

If you strace minicom, what's the last thing you see?
Comment 23 Prabhat Gupta 2008-08-28 02:30:57 UTC
Hello Jiri,

I ran minicom with strace, but cannot see any trace. It just hangs with nothing on the screen. I tried the same thing on another PC by putting the same ISI card and running the kernel 2.6.26. The behavior remains the same. The mincom hangs. To my surprise, it worked once. But the next time I tried, how many times, it just hangs. 
Comment 24 Prabhat Gupta 2009-07-14 09:01:20 UTC
Created attachment 22337 [details]
Log for ISICOM problem...

Hello All,

I have been trying with the same bug but this time with kernel version 2.6.28.3. The isicom driver loads successfully with this kernel but on opening the minicom on /dev/ttyM1a, the system nearly becomes inoperable (becomes extremely slow in responding). This time the /var/log/messages shows some prints related to SATA and minicom which I am attaching. Any suggestions would be highly useful.

Regards,

Prabhat

Note You need to log in before you can comment on or make changes to this bug.