Bug 5790 - serial8250: too much work for irq11
Summary: serial8250: too much work for irq11
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Serial (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-27 22:54 UTC by Kevin
Modified: 2008-09-22 09:09 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.13-15
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
output from dmesg (25.70 KB, text/plain)
2005-12-30 10:42 UTC, Kevin
Details
contents of /proc/interrupts (520 bytes, text/plain)
2005-12-30 10:43 UTC, Kevin
Details
Dmesg output (14.71 KB, application/octet-stream)
2007-11-19 09:55 UTC, Kevin
Details
lspci output (10.14 KB, application/octet-stream)
2007-11-19 09:56 UTC, Kevin
Details
output from acpidump (91.63 KB, application/octet-stream)
2007-11-19 09:57 UTC, Kevin
Details
/proc/interrupts (564 bytes, application/octet-stream)
2007-11-19 09:58 UTC, Kevin
Details
Dmesg output (86.26 KB, application/octet-stream)
2007-11-20 13:15 UTC, Kevin
Details

Description Kevin 2005-12-27 22:54:28 UTC
Most recent kernel where this bug did not occur:
Distribution: SUSE
Hardware Environment: P4M UP-256MB RAM-ALI motherboard with lapic disabled in 
BIOS-Linux installed on external Maxtor drive connected thru' USB (Laptop - 
BenQ Joybook 3000 (DH3000))
Software Environment: SUSE Linux 10.0 2.6.13-15
Problem Description: My console gets flooded by message - "Too many requests 
for irq 11". Obviously, because several devices are using the same irq 11 !! 
To list out - Graphics card NVidia Geforce4, Sound card Realtek AC'97, USB 
Mouse, USB Storage (Maxtor 300GB drive on which linux resides), Modem PCTel 
HSP 56MR and IEEE 1394 firewire !! It definitely is affecting the performance 
of hardware, cause sometimes the mouse and keyboard do not respond and I can 
see my Maxtor drive working real hard. 
I tried to work around by passing lapic command line option for which I get 
the following output-
Local APIC disabled by BIOS - You can enable it by "lapic" 

I have compiled a kernel with local apic support enabled along with all the 
necessary options besides enabling ACPI and disabling APM. Yet, I get the same 
message output on my monitor. When passed the kernel parameter "lapic", it 
hangs and don't respond to ctrl+alt+del and have to powerdown using the 
poweroff button. With "lapic" option, kernel dumps a screenful of text like 
scanning PCI bridge, CPU etc. The last line I see is "PCI: Using configuration 
1, PCI base address xxxxxxx". After that it's eternal silence.
How can I force enable Local APIC to resolve this IRQ sharing problem ?

Steps to reproduce:
Comment 1 Len Brown 2005-12-28 19:21:55 UTC
It isn't clear where this message is coming from.
The string "Too many requests" does not seem to be
present in in SuSE's 2.6.13-15 x86_64 kernel.

Are you running a binary nvidia driver?
  if yes, can you reproduce the issue without it?

Can you reproduce this with a kernel.org kernel?

Can you unload the devices drivers one at a time
to see if a certain driver being present is causing this?

Two things to consider...

1. does the system have an IOAPIC -- the "IO" here is imporant.
   simple way to tell is to build and boot a CONFIG_SMP=y
   kernel and attach the complete output from dmesg -s64000
   and copy of /proc/interrupts

2. if you're stuck with PIC mode, can you move interrupts
   around?  The dmesg and /proc/interrupt from the failing
   case will tell us more about this.  it may be possible to
   free up some irqs and move some around by passing the
   kernel "acpi_irq_balance".
Comment 2 Kevin 2005-12-30 10:42:23 UTC
Created attachment 6904 [details]
output from dmesg
Comment 3 Kevin 2005-12-30 10:43:19 UTC
Created attachment 6905 [details]
contents of /proc/interrupts
Comment 4 Kevin 2005-12-30 10:44:52 UTC
Thanks for the quick response.

Apologize for conceiving the wrong title for my post - "Too many requests for 
irq11" which should have been "Too much work for irq11". I wonder what made me 
to think of the string "Too many request.." !

I do use nvidia binary driver which gets loaded when runlevel 5 is entered. 
But, the string :Too much work for irq11" is dumped on my screem much earlier, 
when the irqs are assigned/shared, I think.

I haven't had the time and chance to try and reproduce the same with a 
kernel.org kernel though, I might in the next one or two days and shall let 
you know of my experience with the same.

Acting on your suggestions to compile an smp kernel and boot off with it, i've 
done so and have attached the from dmesg and /proc/interrupts.

Now, about the second suggestion about moving about irqs. I wouldn't mind 
disabling Synaptics touchpad as I use a USB mouse and think this would free up 
an irq. Same goes for my network card which I don't use as I'm not connected 
to any network. May be I can disable the card reader as well if I could tell 
if it is occupying an irq all for itself.

Should you need further information, I'm eager to provide whatever info is 
required to resolve this.

I certainly am interested in knowing if my system has an IOPIC, which I 
couldn't tell for sure looking at the output of the dmesg.

Thanks & regards,

Kevin
Comment 5 Kevin 2006-01-13 21:47:13 UTC
Sorry if this comes in as a nuisance ! 
I was wondering if my problem is being looked at !? 
Thanks & regards
Comment 6 Len Brown 2007-08-19 00:46:46 UTC
No, this system does not appear to have an IOAPIC.

> Local APIC disabled by BIOS -- reenabling.
> Found and enabled local APIC!

Please do _not_ boot with the "lapic" parameter.
Assume that the BIOS disabled the LAPIC for a good reason.

> Kernel command line: root=/dev/sda7 selinux=0  resume=/dev/sda11 
> splash=silent showopts lapic apic=debug acpi=ht

Please do not boot with "acpi=ht" -- it is effectively
the same as "acpi=off" on this system.  However, you have
effectively proved that the issue at hand is not caused by ACPI:-)

> serial8250: too much work for irq11

The first part of this message turns out to be important,
it tells us that it is the serial 8250 driver complaining....
In particular, serial8250_interrupt() is complaining
that it has to much to do.

Indeed, it appears to be polling 4 serial ports on a shared IRQ...

So fix the bootparams above, re-test with something recent, say 2.6.22
and let us know how things look...
Comment 7 Fu Michael 2007-11-12 22:32:41 UTC
Kevin, any response?
Comment 8 Kevin 2007-11-13 10:06:03 UTC
Ooooops ! Sorry for the late reply.  I just didn't notice the notification mail ! This bug report has been so long forgotten !

First of all, thanks a million for chasing this long forgotten bug ! :-)

Yes, I actually upgraded the kernel and currently I'm using 2.6.23-rc4 with realtime patch. and things are working just great. With 2.6.22 I did see similar messages a bunch of times and never witnessed those messages after I applied realtime patch to 2.6.22.

FYI, I'm not using any bootparams and LAPIC is still disabled. I guess, either my laptop doesn't support LAPIC or there's no way to enable it.

I would be glad to provide any information you might need further to this. Otherwise, plase feel free to mark this bug as resolved/closed.

Thanks again.

Regards,

Kevin
Comment 9 Fu Michael 2007-11-18 18:52:59 UTC
Kevin, are you able to test 2.6.23-rc4 _without_ realtime patch? I want to know if the fix has been usptreamed or still in real-time patch. thanks.
Comment 10 Len Brown 2007-11-19 08:08:18 UTC
per michael's comment, it would be good to know if this is gone
because of the realtime patch, or if it is gone in vanilla upstream.

but in either case, this isn't an acpi bug -- as it happens
both with and without ACPI enabled.

It might be good to try to do less interrupt sharing.
This is a challenge with a system that has no IOAPIC.
But you might find that in the  BIOS setup you can
either move the IRQ for motherboard devices, or disable
devices that you are not using.

If you want to move interrupts from within Linux, then
you need to attach the complete dmesg from an ACPI-enabled
boot, the output from lspci, and the output from acpidump.
(though i can't tell with what we've got so far if ACPI
 has any flexibility to move the IRQs at all).

In any case, the problem at hand appears to be the actual serial
device, so I'm moving this to drivers/serial.
Comment 11 Kevin 2007-11-19 09:55:36 UTC
Created attachment 13618 [details]
Dmesg output
Comment 12 Kevin 2007-11-19 09:56:21 UTC
Thanks for the replies.

I would like mention a couple of things before I move into providing the gore details.

First, I have moved from SuSE to BLFS.

Secondly, the problem seemingly disappeared since the introduction of kernel series 2.16.19 as I don't remember of having seen those messages since then !

To the contrary, I once installed gentoo (just for the fun of it) which, used 2.6.21 (if I remember it right) and encountered similar messages. I am sure it was again about too many requests for irq 11 but, don't remember if it was from the serial driver itself !

Could it be possible that irq handling code changed somewhere in series 2.6.19 which possibly prevented the problem to some extent and the realtime patch prevents it even further so, I don't get to see those messages at all !? But, one thing for sure, I haven't seen those messages on my BLFS system since 2.6.19 which was without the realtime patch. I started applying realtime patch since 2.6.21.

The machine I'm using is a laptop  and the BIOS on this machine does not provide any options to (re)assign/change irqs at all. I would love to disable some of the devices on this machine which I haven't used till this day. These include firewire, card reader, PCMCIA slot, synaptics touchpad (which takes an irq all for itself !!) but, there simply is no way I can do it, unless I hack the BIOS ROM (I haven't the slightest clue how !?).

I've attached output from dmesg, lspci and acpidump.

Thanks again.

Kevin
Comment 13 Kevin 2007-11-19 09:56:55 UTC
Created attachment 13619 [details]
lspci output
Comment 14 Kevin 2007-11-19 09:57:19 UTC
Created attachment 13620 [details]
output from acpidump
Comment 15 Kevin 2007-11-19 09:58:41 UTC
Created attachment 13621 [details]
/proc/interrupts
Comment 16 Kevin 2007-11-19 10:04:44 UTC
Sorry, forgot to mention.

I have attached /proc/interrupts as I found an extra line in the end which seems to be about irq errors ?

Besides, I would want to test 2.6.23 without the realtime patch but, I guess this will take some time. So, I will come back and report once I've tested it for a day or two.

Thanks.

Kevin
Comment 17 Len Brown 2007-11-19 13:03:11 UTC
If you can still reproduce an issue with the upstream kernel,
you need to be able to reproduce it without the binary nvidia driver
for the bug report to be valid here.

But I don't see the offending device in /proc/interrupts attached,
just i8042 with dedicated irq1 and irq12.

Re: interrupt control, we actually have some on this sytem:

ACPI: PCI Interrupt Link [LNKA] (IRQs 7 10 *11)
ACPI: PCI Interrupt Link [LNKB] (IRQs 7 *10 11)
ACPI: PCI Interrupt Link [LNKC] (IRQs 7 10 *11)
ACPI: PCI Interrupt Link [LNKD] (IRQs 10 *11)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 5) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 6) *4
ACPI: PCI Interrupt Link [LNKG] (IRQs 7 10 *11)
ACPI: PCI Interrupt Link [LNKU] (IRQs 7 10 *11)
ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
ACPI: PCI Interrupt 0000:00:05.0[A] -> Link [LNKD] -> GSI 11 (level, low) -> IRQ 11
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10
ACPI: PCI Interrupt 0000:00:05.1[B] -> Link [LNKB] -> GSI 10 (level, low) -> IRQ 10
ACPI: PCI Interrupt Link [LNKG] enabled at IRQ 11
ACPI: PCI Interrupt 0000:00:04.1[B] -> Link [LNKG] -> GSI 11 (level, low) -> IRQ 11
ACPI: PCI Interrupt 0000:00:04.0[A] -> Link [LNKG] -> GSI 11 (level, low) -> IRQ 11
ACPI: PCI Interrupt Link [LNKF] enabled at IRQ 6
ACPI: PCI Interrupt 0000:00:0e.1[B] -> Link [LNKF] -> GSI 6 (level, low) -> IRQ 6
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 11
ACPI: PCI Interrupt 0000:00:0f.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11
ACPI: PCI Interrupt 0000:00:0f.1[B] -> Link [LNKB] -> GSI 10 (level, low) -> IRQ 10
ACPI: PCI Interrupt 0000:00:0f.2[C] -> Link [LNKB] -> GSI 10 (level, low) -> IRQ 10
ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) -> IRQ 11

The only thing that is odd is this one:
ACPI: PCI Interrupt Link [LNKF] (IRQs 6) *4
We are handed this link on IRQ4, and told that IRQ6 is the only
valid IRQ for it.  Your ethernet seems to be working fine.
But if you reproduce the issue, try it with and without the
ethernet plugged -- just for grins.

Re: balancing IRQs a bit

I you are daring, you can boot with "acpi_irq_balance".
it will try to balance the IRQs -- though the reason we don't
enable this by default on PIC systems is that how well it works
depends a lot on the quality of the BIOS.

Re: ERR:          5

they are from this:

spurious 8259A interrupt: IRQ7.

which is not a big deal, unless you get so many that they are impacting performance.

Re: acpi_os_name="Microsoft Windows XP"

This is a NO-OP on your system.
Indeed, it is a NO-OP on all systems I've ever seen.
If somebody is telling folks to us this, I'd like to know why.
Comment 18 Kevin 2007-11-20 13:14:19 UTC
Thanks for the reply.

I downloaded and compiled (without any patches) 2.6.23.8 and ran it for 8 hours (without the binary nvidia driver and with 'nv' driver) and I couldn't see anything relevant. SO, I guess, the problem has been fixed in the upstream kernel ! :-)

Please note that, I ran the above with both acpi_irq_balance set and unset. To be specific, I ran it without acpi_irq_balance for a couple of hours and with acpi_irq_balance set for the remaining duration. Setting acpi_irq_balance results in soft lockup bug. I've attached dmesg output for your info.

I couldn't remember what the device 8250 is ? I think it could be the infra red port on my laptop. I tried enabling it by loading the necessary FIR modules, only to crash my system real hard !! :-(

I'm in total agreement regarding acpi_os_name="Microsoft Windows XP". I'd hate to have that line and since, it hasn't had any advantages at all, I just removed the line from the boot menu. The only reason to include it was suspecting the possibility that, BIOS after detecting non MS OS, might disable some of its features. Oh, yes ! DSDT had a couple of errors and I corrected those, recompiled and have it loaded through initrd. To facilitate it further, I decided to include the acpi_os_name thingy which, made no difference and hence, has vanished from the boot menu now ! ;-)

Thanks again.

Kevin
Comment 19 Kevin 2007-11-20 13:15:21 UTC
Created attachment 13656 [details]
Dmesg output

Note You need to log in before you can comment on or make changes to this bug.