Bug 40002 - IRQ 0 assigned to VGA
Summary: IRQ 0 assigned to VGA
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Interrupts (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Feng Tang
URL: https://bugzilla.novell.com/show_bug....
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-25 17:40 UTC by Szymon Kowalczyk
Modified: 2012-07-14 15:14 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.37 and up
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
output of vga irq (lspci, dmesg, acpidump, uname) on 3.2.0.7(bad) and 2.6.34-12(ok) (80.45 KB, application/zip)
2012-01-28 17:11 UTC, Szymon Kowalczyk
Details
lspci -vvvx (17.00 KB, text/plain)
2012-05-24 19:43 UTC, Szymon Kowalczyk
Details
/proc/interrupts (1.15 KB, text/plain)
2012-05-24 19:44 UTC, Szymon Kowalczyk
Details
acpi_debug.patch (923 bytes, text/plain)
2012-05-25 00:53 UTC, Feng Tang
Details
dmesg with apic=debug after fix and debug patch (56.07 KB, text/plain)
2012-05-26 08:53 UTC, Szymon Kowalczyk
Details
lspci -vvvx after fix patch (17.00 KB, text/plain)
2012-05-26 08:56 UTC, Szymon Kowalczyk
Details
/proc/interrupts after pix patch (1.21 KB, text/plain)
2012-05-26 08:57 UTC, Szymon Kowalczyk
Details
dmidecode after fix (9.13 KB, text/plain)
2012-05-28 05:07 UTC, Szymon Kowalczyk
Details
new acpi patch (1.24 KB, text/plain)
2012-05-28 05:59 UTC, Feng Tang
Details
dmesg with acpi=defug, patched kernel (54.62 KB, text/plain)
2012-05-29 02:06 UTC, Szymon Kowalczyk
Details

Description Szymon Kowalczyk 2011-07-25 17:40:43 UTC
In openSUSE 11.4 and also in Ubuntu 10.14 my wideo card 

S3 UniChrome Pro P4M800 Pro

PCI:*(0:1:0:0) 1106:3344:1734:109b rev 1, Mem @ 0xf0000000/67108864,
0xd1000000/16777216


has IRQ 0 assigned

in openSUSE 11.3 and earlier it has IRQ 16

I can use system but I have to klick or move mouse from time to time because
system freeze.

adding boot parameters

pci=noacpi
acpi=off

freeze system immediately after boot-splash appear 

I have no option in bios for ACPI/IRQ settings

computer is

Fujitsu-Siemens Amilo Pro V2030D

Reproducible: Always

"Yeah, this looks like a very broken irq assignment code bug:
-pci 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
+pci 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 0"


more info:

/proc/interrupts
dmesg
lspci -v -s 01:00.0
acpidump
dmidecode
lspci -nn
Comment 1 Zhang Rui 2012-01-18 05:28:34 UTC
It's great that the kernel bugzilla is back.

Can you please verify if the problem still exists in the latest upstream
kernel?

If the problem still exists.
please attach the output of acpidump.
please attach the output of lspci -vvxxx -s 01:00.0
please attach the dmesg output after boot
in both working and broken kernels.
Comment 2 Szymon Kowalczyk 2012-01-28 17:08:02 UTC
Problem still exist

in attachment is zipped file with requested outputs

latest kernel test is: 3.2.0-7.1

working kernel version is:  2.6.34-12

on opensuse bugzilla there is more info:

https://bugzilla.novell.com/show_bug.cgi?id=676068

####################################################

Thomas Renninger 2011-12-21 08:08:37 UTC

This has been shortly discussed with Eric Biedermann (unfortunately no mailing
list was in CC):

> On Sunday 17 April 2011 02:27:51 Eric W. Biederman wrote:
>> Thomas Renninger <trenn@suse.de> writes:
>> 
>> But reading the dmesg I can see clearly that the timer is not working
>> on that irq pin pair.
> Not sure I understood this, we have:
>   - irq0 override happens
>   - irq0 override does not happen

In both cases it appears the irq0 override happens.  In the old code it
was just more confusing, and the override happened poorly enough that it
was still possible to use gsi 16 as linux irq 16.  I don't remember the
bugs well enough to remember how that could have happened.

> But in both cases the timer does not work, but in one it at least fall
> back to working lapic timer?

We initially fall back to finding the timer through virtual wire mode,
and in older kernel we switch from virtual wire mode to the lapic
timer.

> But in both cases one sees a sane number of timer interrupts happening 
> on irq0?

Ultimately.

> And it's about this line:
> ACPI: IRQ0 used by override.
> not the irq 16->0 override, right?

They should be different reports of the same mechanism at work.

>> In both kernels I see.
>> [    0.028398] ..TIMER: vector=0x30 apic1=0 pin1=16 apic2=-1 pin2=-1
>> [    0.032000] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
>> [    0.032000] ...trying to set up timer (IRQ0) through the 8259A ...
>> [    0.032000] ..... (found apic 0 pin 16) ...
>> [    0.032000] ....... failed.
>> [    0.032000] ...trying to set up timer as Virtual Wire IRQ...
>> [    0.075529] ..... works.
>> 
>> The vga card does not claim the interrupt in the newer kernels.  I
>> expect that is because the linux irq became 0, which we use to signal
>> no irq has been assigned.
> And 0 (the override) is wrong, it must be irq 16 for the vga card, 
> right?

The VGA card could probably work at any other linux irq number.
The problem is because we know that 0 is only used by the ISA
timer irq we have special-cased 0 to also mean no irq has
been assigned.

Except in very rare cases like this where BIOS 

the ability to screw up with irq overrides and then ACPI used
that freedom to screw up

permission to screw up with irq overrides and then used it

> So there are two unrelated irq problems:
>   1) vga/agp
>   2) timer

Roughly.  The BIOS says the same irq is for both uses, but that irq
input does not work as a timer input.

####################################################
Comment 3 Szymon Kowalczyk 2012-01-28 17:11:48 UTC
Created attachment 72214 [details]
output of vga irq (lspci, dmesg, acpidump, uname) on 3.2.0.7(bad) and 2.6.34-12(ok)
Comment 4 Zhang Rui 2012-05-24 08:03:55 UTC
Feng,
there is another Interrupt problem, would you please have a look at it?
Comment 5 Feng Tang 2012-05-24 13:04:59 UTC
Hi Szymon,

Could you please post the "/proc/interrupts" and full "lspci -vvvx" and your kernel config for both the good and bad kernel? Thanks,

BTW, could you check which kernel is the last known good kernel besides of this 2.6.34 working one, 2.6.35? 2.6.36? If you have time to do a bisect, that'll be perfect :)

- Feng
Comment 6 Szymon Kowalczyk 2012-05-24 19:43:35 UTC
Created attachment 73383 [details]
lspci -vvvx
Comment 7 Szymon Kowalczyk 2012-05-24 19:44:34 UTC
Created attachment 73384 [details]
/proc/interrupts
Comment 8 Szymon Kowalczyk 2012-05-24 20:04:32 UTC
About kernel versions:

last working kernel version was from 

openSUSE 11.3 (2.6.34)

in

openSUSE 11.4 (2.6.37.1)

problem appear

about config

that was kernel desktop delivered with openSUSE

openSUSE 11.3

http://kernel.opensuse.org/cgit/kernel-source/plain/config/i386/desktop?h=openSUSE-11.3

openSUSE 11.4

http://kernel.opensuse.org/cgit/kernel-source/plain/config/i386/desktop?h=openSUSE-11.4

please tell me with exactly kernel should I test
Comment 9 Feng Tang 2012-05-25 00:53:11 UTC
Created attachment 73385 [details]
acpi_debug.patch

Hi Szymon,

Could you test this debug patch and give me the dmesg log?

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 4558f0d..8e06cef 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -106,8 +106,12 @@ static unsigned int gsi_to_irq(unsigned int gsi)
 	unsigned int irq = gsi + NR_IRQS_LEGACY;
 	unsigned int i;
 
+	printk("%s(): gsi = %d, gsi_top = %d, NR_IRQS_LEGACY = %d\n",
+		__func__, gsi, gsi_top, NR_IRQS_LEGACY);
+
 	for (i = 0; i < NR_IRQS_LEGACY; i++) {
 		if (isa_irq_to_gsi[i] == gsi) {
+			printk("%s(): we found a isa match, i = %d\n", __func__, i); 
 			return i;
 		}
 	}
@@ -562,7 +566,9 @@ int acpi_register_gsi(struct device *dev, u32 gsi, int trigger, int polarity)
 	unsigned int plat_gsi = gsi;
 
 	plat_gsi = (*__acpi_register_gsi)(dev, gsi, trigger, polarity);
+	printk("%s(): plat_gsi = %d\n", __func__, plat_gsi);
 	irq = gsi_to_irq(plat_gsi);
+	printk("%s(): irq = %d\n", __func__, irq);
 
 	return irq;
 }
Comment 10 Feng Tang 2012-05-25 03:17:06 UTC
I think I may get the root cause of this issue, could you pls try this patch with kernel 3.2? to see if it fix the issue.

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 8e06cef..0be07d9 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -420,6 +420,11 @@ acpi_parse_int_src_ovr(struct acpi_subtable_header * header,
                return 0;
        }

+       if (intsrc->source_irq == 0 && intsrc->global_irq == 16) {
+               printk(PREFIX "BIOS IRQ0 pin0 --> pin16 override ignored.\n");
+               return 0;
+       }
+
        if (intsrc->source_irq == 0 && intsrc->global_irq == 2) {
                if (acpi_skip_timer_override) {
                        printk(PREFIX "BIOS IRQ0 pin2 override ignored.\n");
Comment 11 Feng Tang 2012-05-25 14:26:12 UTC
Hi Szymon,

No matter whether the last patch fix the issue, pls add "apic=debug" to the kernel command line and post the dmesg, thanks,

- Feng
Comment 12 Szymon Kowalczyk 2012-05-26 08:42:09 UTC
OK it seems to be fine :)

below logs with apic=debug

should I send this patch to openSUSE bugzilla as fix
Comment 13 Szymon Kowalczyk 2012-05-26 08:53:04 UTC
Created attachment 73402 [details]
dmesg with apic=debug after fix and debug patch
Comment 14 Szymon Kowalczyk 2012-05-26 08:56:32 UTC
Created attachment 73403 [details]
lspci -vvvx after fix patch
Comment 15 Szymon Kowalczyk 2012-05-26 08:57:52 UTC
Created attachment 73404 [details]
/proc/interrupts after pix patch
Comment 16 Feng Tang 2012-05-28 01:14:27 UTC
(In reply to comment #12)
> OK it seems to be fine :)
> 
> below logs with apic=debug
> 
> should I send this patch to openSUSE bugzilla as fix

Glad to know the patch works.

You can send it to OpenSuSe bugzilla, but it's surely a hack and not in a good shape for upstream.

Could you post the "sudo dmidecode" output, maybe we can add a quirk for your system.

- Feng
Comment 17 Szymon Kowalczyk 2012-05-28 05:07:44 UTC
Created attachment 73435 [details]
dmidecode after fix
Comment 18 Szymon Kowalczyk 2012-05-28 05:15:35 UTC
I attached dmidecode output.

It can be stupid but I thinking to myself about this "hack":

Is there any way that:

intsrc->source_irq will be 0

when 

intsrc->global_irq is > 0

because there is check about later in code: 

intsrc->global_irq == 2

maybe there should be sth like:

 if (intsrc->source_irq == 0 && intsrc->global_irq > 0) {
Comment 19 Feng Tang 2012-05-28 05:59:48 UTC
Created attachment 73436 [details]
new acpi patch

Yes, that's similar to what I thought to. 

Could you try the new attached patch and post the results and dmesg?
Comment 20 Szymon Kowalczyk 2012-05-29 02:06:21 UTC
Created attachment 73451 [details]
dmesg with acpi=defug, patched kernel

Patch is working :)
Comment 21 Szymon Kowalczyk 2012-05-29 02:12:39 UTC
Comment on attachment 73451 [details]
dmesg with acpi=defug, patched kernel

I found this in dmesg:

[    0.000000] Using APIC driver default
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: at /usr/src/linux-3.1.10-1.9/arch/x86/kernel/acpi/boot.c:1358 dmi_ignore_irq0_timer_override+0x2c/0x53()
[    0.000000] Hardware name: AMILO PRO V2030
[    0.000000] ati_ixp4x0 quirk not complete.
[    0.000000] Modules linked in:
[    0.000000] Pid: 0, comm: swapper Not tainted 3.1.10-1.9-desktop #1
[    0.000000] Call Trace:
[    0.000000]  [<c0205433>] try_stack_unwind+0x163/0x180
[    0.000000]  [<c0204167>] dump_trace+0x47/0xf0
[    0.000000]  [<c020549b>] show_trace_log_lvl+0x4b/0x60
[    0.000000]  [<c02054c8>] show_trace+0x18/0x20
[    0.000000]  [<c06f4cff>] dump_stack+0x6d/0x72
[    0.000000]  [<c0248868>] warn_slowpath_common+0x78/0xb0
[    0.000000]  [<c0248933>] warn_slowpath_fmt+0x33/0x40
[    0.000000]  [<c0ae05e9>] dmi_ignore_irq0_timer_override+0x2c/0x53
[    0.000000]  [<c05d6c28>] dmi_check_system+0x28/0x40
[    0.000000]  [<c0ae0ead>] acpi_boot_init+0xa/0x60
[    0.000000]  [<c0ada20b>] setup_arch+0x636/0x6bf
[    0.000000]  [<c0ad7488>] start_kernel+0x7e/0x369
[    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.000000] FUJITSU SIEMENS detected: Ignoring BIOS IRQ0 pin2 override
Comment 22 Feng Tang 2012-05-29 02:18:44 UTC
Yes, the warning msg is expected.

The root cause of this issue is the buggy FW which assign GSI 16 to 2 irqs, and what the patch did is add a quirk to ignore one assignment.

The OS will show this warning once it detects this quirk.

If you really hate this warning, you can add "acpi_skip_timer_override" to your kernel command line.
Comment 23 Florian Mickler 2012-07-01 09:46:32 UTC
A patch referencing this bug report has been merged in Linux v3.5-rc5:

commit f6b54f083cc66cf9b11d2120d8df3c2ad4e0836d
Author: Feng Tang <feng.tang@intel.com>
Date:   Mon Jun 4 15:00:06 2012 +0800

    ACPI: Add a quirk for "AMILO PRO V2030" to ignore the timer overriding
Comment 24 Florian Mickler 2012-07-01 09:48:06 UTC
A patch referencing this bug report has been merged in Linux v3.5-rc5:

commit ae10ccdc3093486f8c2369d227583f9d79f628e5
Author: Feng Tang <feng.tang@intel.com>
Date:   Mon Jun 4 15:00:04 2012 +0800

    ACPI: Make acpi_skip_timer_override cover all source_irq==0 cases

Note You need to log in before you can comment on or make changes to this bug.