Bug 1123
Description
Felipe Alfaro Solana
2003-08-18 15:50:26 UTC
Created attachment 666 [details]
config used to compile 2.6.0-test3-bk6
Created attachment 667 [details]
dmesg output after booting *with* a PS/2 mouse plugged in
Created attachment 668 [details]
Modified version of i8042.c while some additional printk()
This is the i8042.c version of 2.6.0-test3-bk6 with additional printk()'s to
find where exactly the kernel was hanging.
It just turned out the problem was while trying to request IRQ #12 by calling
request_irq() inside i8042_check_aux().
Created attachment 669 [details]
a copy of /proc/interrupts
Created attachment 670 [details]
output of the "lspci -vvv" command
This is the output of "lspci -vvv" while running a 2.4.21 kernel.
Created attachment 671 [details]
The System.map file for the 2.6.0-test3-bk6 kernel
I think this System.map can be interesting to decipher all those "initcall"
debug messages.
- same for 2.6.0-test4 (independend of whether i plug mouse or not, though) - seems to hang in request_irq in i8042_check_mux What is the most recent kernel that you know worked? I'm wondering if this is due to recent ACPI 20030813 changes, or earlier. Also, when you "disabled ACPI" and the system worked, did you use acpi=off? Can you confirm that pci=noacpi also boots? Attach the output from dmidecide, available in /usr/sbin/, or here: http://www.nongnu.org/dmidecode/ Attach the output from acpidmp, available in /usr/sbin/, or in here http://www.intel.com/technology/iapc/acpi/downloads/pmtools-20010730.tar.gz Attach the dmesg output showing the failure, if possible. The most recent kernel that worked for me is any of the -test3 series, for example, -test3-mm3, if my memory serves me well. When I disabled ACPI, I meant that I took out all ACPI code out of the kernel, and instead, compiled it using APM exclusively. I have tried, however, leaving ACPI compiled in (with no APM) and booting using "pci=noacpi" and it works perfectly. Next to this, I will attach a dmesg of 2.6.0-test4-bk1 booted with "acpi=noacpi", the output of dmesg when booting normally, and also the output of "dmidecode" and "acpidmp". Created attachment 702 [details]
output of the "acpidmp"
This is the output of running the "acpicmp" from the pmtools Intel package on
the P4 machine, while running a 2.6.0-test4-bk1 kernel booted up using
"pci=noacpi".
Created attachment 703 [details]
output of the "dmidecode" tool
This is the output of the "dmidecode" tool from Red Hat's kernel-utils-2.4-8.34
package.
Created attachment 704 [details]
output of "dmesg" for 2.6.0-test4-bk1 booted with "pci=noacpi"
Created attachment 705 [details]
output of "dmesg" for 2.6.0-test4-bk1 booted without "pci=noacpi"
- linux-2.6.0-test3 was the last kernel that worked for me - linux-2.6.0-test4 is the first kernel that has/exposes the bug - linux-2.6.0-test4 hangs in i8042_check_mux() in request_irq() - linux-2.6.0-test4 with acpi=off works - linux-2.6.0-test4 with pci=noacpi works Created attachment 706 [details] ingok@gmx.net: output of dmidecode Created attachment 707 [details] ingok@gmx.net: output of acpidmp OK, I've checked for a new BIOS and found one with date July, 7th 2003. This new BIOS supports IO-APIC and MPS 1.4 specification. I have flashed this new BIOS and I have found that 2.6.0-test4-bk1 boots fine if ACPI and APIC+IOAPIC are both enabled at the same time. Booting this new kernel with "noapic" causes the original hangs of this report when probing for the AUX port. Also, booting with "noapic pci=noacpi" also works. Thus, the system behaves exactly the same as with the old BIOS, but since this new BIOS includes APIC support, enabling APIC and IO-APIC also solves all problems. I have attached a new "dmesg" for 2.6.0-test4-bk1 with ACPI and APIC enabled, and booting with no extra kernel parameters. Also, I have added the output of running "lspci -vvv" and a copy of "/proc/interrupts" when the system is running with APIC and ACPI. Additionally, I have superseded my old "dmidecode" and "acpidmp" attachments with new ones. Created attachment 710 [details]
dmesg output of 2.6.0-test4-bk1 booted up with ACPI and APIC
This is the dmesg output of 2.6.0-test4-bk1 running on a Platinix 2D/533-A
motherboard with the latest BIOS, and booted with no extra kernel parameters,
that is, ACPI is enabled and so is IO-APIC.
I can't attach the dmesg output when booting with "noapic pci=noacpi" since the
machine hangs while probing for the AUX port and, since I'm booting with no
APIC support, the dmesg output differs considerably with respect to the dmesg
of booting with APIC enabled.
Created attachment 711 [details]
output of the "acpidmp" tool
Created attachment 712 [details]
output of the "dmidecode" tool
Created attachment 713 [details]
a copy of /proc/interrupts
Created attachment 714 [details]
output of the "lspci -vvv" command
I am experiencing this problem also, on a p4 motherboard with a Via chipset. I have determined that when the system hangs, it is actually repeatedly calling i8042_interrupt() in an infinite loop. The first call to i8042_interrupt occurs when setup_irq in arch/i386/kernel/irq.c calls spin_unlock_irqrestore (i.e., the moment when interrupt processing by i8042_interrupt becomes available). Created attachment 720 [details]
output of dmidecode for freya.yggdrasil.com
Created attachment 721 [details]
"lspci -vvv" output for freya.yggdrasil.com
Created attachment 722 [details]
/proc/interrupts for freya.yggdrasil.com
Any chance to get the boot message when fail to boot(through serial port)? Created attachment 738 [details]
serial console output for freya.yggdrasil.com
The lines begining with "AJR" are from printk calls that I added. I have
left them in, because they are pretty self-explanatory and provide some
useful information. In particular, they show that the interrupt handler
is being called in some kind of infinite loop. Perhaps there is some kind
of interrupt acknowledgement problem or some kind of edge versus level
misconfiguration.
Created attachment 745 [details]
freya.yggdrasil.com console log from older kernel
This is a log of the console output under 2.6.0-test3, which does not
experience this problem. I believe someone asked for it in order to compare
with the failing 2.6.0-test4 output.
We acknowledged this problem, and working on it. *** Bug 1144 has been marked as a duplicate of this bug. *** Please reference bug#10, it looks like a similar issue root-caused there I tried the "Initial fix for GA-7VAX" given in bug #10, and my system still hung at the same point as before. Would you please retry without Plug and Play support built-in? Thanks a lot! Also, I need you try UP kernel instead of SMP kernel.(And ACPI are fully enabled , debug option is openend!) And please remove your printk. Thanks a lot. I just tried out test5 and am getting the same result. this continues in test5. I will not be able to test new kernels on the machine that experiences this problem until at least 2003.10.04. So, I would appreciate it if someone else who is experiencing this problem would try Luming's requests (comments 34 and 35), which were the following. From comment 34, retry without Plug and Play support (not sure if this means ISA PnP or some other configuration option). From comment 35, retry with: CONFIG_SMP not set, CONFIG_ACPI set, and debug option (?) enabled. Would you please have patch at bug 1186 a try? thanks a lot. The patch at bug 1186 (http://bugme.osdl.org/attachment.cgi?id=903&action=view) fixes the problem for me. Confirmed, this patch makes the problem go away... 2.6.0-test5-bk9, UP, ACPI, VIA KT400 yep, the <a href=http://bugme.osdl.org/attachment.cgi?id=903&action=view>patch at bug 1186</a> works for me (<a href=http://bugzilla.kernel.org/show_bug.cgi?id=1123#c14>comment 14</a>) too. with plain 2.6.0-test6 the boots and AFAICS everything about works ok, but there is a 'irq 12: nobody cared!' message exactly where the hang used to be. dmesg output follows Created attachment 958 [details]
dmesg output for 2.6.0-test6
Did you try that patch which fixed your problem of -test5? Did such kind of error message appear on -test5? just checked, the patch from bug 1186 makes the system boot without error message for test6, as was the case for test5. The machine that had this problem with 2.6.0-test4 does not have this problem with 2.6.0-test8. 2.6.0-test8 does not have the patch that Ingo pointed to from bug 1186 ( http://bugme.osdl.org/attachment.cgi?id=903&action=view ), but drivers/acpi/pci_link.c has changed somewhat between 2.6.0-test4 and 2.6.0-test8. Does anyone still have this problem as of 2.6.0-test8? please re-open if you still have ps/2 IRQ problems with the latest 2.4 or 2.6 kernel. thanks, -Len |