Description Felipe Alfaro Solana 2003-08-18 15:50:26 UTC
Distribution: Red Hat Linux Taroon Beta 1 Hardware Environment: Pentium IV 2.0 GHz i845DE-based motherboard ATI Radeon 7000 VE 3Com Corporation 3c905C-TX-M hda: ST380021A, ATA DISK drive hdb: IBM-DTLA-307030, ATA DISK drive hdc: Pioneer DVD-ROM ATAPIModel DVD-500M B00, ATAPI CD/DVD-ROM drive hdd: PLEXTOR CD-R PX-W4824A, ATAPI CD/DVD-ROM drive Microsoft Natural PS/2 keyboard No mouse Software Environment: Red Hat Linux Taroon Beta 1 Problem Description: If I try to boot my P4 box (i845DE motherboard) with no PS/2 mouse plugged into the PS/2 port, the kernel hangs while checking the AUX ports in function i8042_check_aux(). The i8042_check_aux() function is trying to request IRQ #12, but the call to request_irq() causes the hang. The kernel hangs exactly at: if (request_irq(values->irq, i8042_interrupt, SA_SHIRQ, "i8042", i8042_request_irq_cookie)) in drivers/input/serio/i8042.c, with a value of 12 for values->irq. If I boot with my PS/2 mouse attached, the kernel is able to boot normally. Also, disabling ACPI support in the kernel allows me to boot 2.6.0-test3-bk6 with no PS/2 mouse plugged in. I have booted up the kernel with initcall_debug=1 to see where it was hanging. It's not pretty informative (the last initcall report notesi8042_init), but I've left it enabled as you will see in the "dmesg"output. Attached to this report, I've included System.map (useful as I've enabledinitcall debug), and the sources of the i8042.c that I modified with plenty of printk() messages to guess where the hang was being produced. Also, the output of "lspci -vvv" while running a 2.4.21 kernel, "dmesg" for 2.6.0-test3-bk6 and the list of interruptsbeing used (when booted with a PS/2 mouse attached), plus the "config"file used to compile the kernel are attached to this message. Steps to reproduce: 1. Compile 2.6.0-test3-bk6 using the supplied config file 2. Try to boot it on a i845DE motherboard while no mouse is plugged in
Comment 1 Felipe Alfaro Solana 2003-08-18 15:51:31 UTC
Created attachment 666 [details] config used to compile 2.6.0-test3-bk6
Comment 2 Felipe Alfaro Solana 2003-08-18 15:52:01 UTC
Created attachment 667 [details] dmesg output after booting *with* a PS/2 mouse plugged in
Comment 3 Felipe Alfaro Solana 2003-08-18 15:53:57 UTC
Created attachment 668 [details] Modified version of i8042.c while some additional printk() This is the i8042.c version of 2.6.0-test3-bk6 with additional printk()'s to find where exactly the kernel was hanging. It just turned out the problem was while trying to request IRQ #12 by calling request_irq() inside i8042_check_aux().
Comment 4 Felipe Alfaro Solana 2003-08-18 15:54:17 UTC
Created attachment 669 [details] a copy of /proc/interrupts
Comment 5 Felipe Alfaro Solana 2003-08-18 15:55:04 UTC
Created attachment 670 [details] output of the "lspci -vvv" command This is the output of "lspci -vvv" while running a 2.4.21 kernel.
Comment 6 Felipe Alfaro Solana 2003-08-18 15:56:26 UTC
Created attachment 671 [details] The System.map file for the 2.6.0-test3-bk6 kernel I think this System.map can be interesting to decipher all those "initcall" debug messages.
Comment 7 ingo 2003-08-23 02:31:32 UTC
- same for 2.6.0-test4 (independend of whether i plug mouse or not, though) - seems to hang in request_irq in i8042_check_mux
Comment 8 Len Brown 2003-08-23 19:49:06 UTC
What is the most recent kernel that you know worked? I'm wondering if this is due to recent ACPI 20030813 changes, or earlier. Also, when you "disabled ACPI" and the system worked, did you use acpi=off? Can you confirm that pci=noacpi also boots? Attach the output from dmidecide, available in /usr/sbin/, or here: http://www.nongnu.org/dmidecode/ Attach the output from acpidmp, available in /usr/sbin/, or in here http://www.intel.com/technology/iapc/acpi/downloads/pmtools-20010730.tar.gz Attach the dmesg output showing the failure, if possible.
Comment 9 Felipe Alfaro Solana 2003-08-24 04:38:01 UTC
The most recent kernel that worked for me is any of the -test3 series, for example, -test3-mm3, if my memory serves me well. When I disabled ACPI, I meant that I took out all ACPI code out of the kernel, and instead, compiled it using APM exclusively. I have tried, however, leaving ACPI compiled in (with no APM) and booting using "pci=noacpi" and it works perfectly. Next to this, I will attach a dmesg of 2.6.0-test4-bk1 booted with "acpi=noacpi", the output of dmesg when booting normally, and also the output of "dmidecode" and "acpidmp".
Comment 10 Felipe Alfaro Solana 2003-08-24 04:40:24 UTC
Created attachment 702 [details] output of the "acpidmp" This is the output of running the "acpicmp" from the pmtools Intel package on the P4 machine, while running a 2.6.0-test4-bk1 kernel booted up using "pci=noacpi".
Comment 11 Felipe Alfaro Solana 2003-08-24 04:43:00 UTC
Created attachment 703 [details] output of the "dmidecode" tool This is the output of the "dmidecode" tool from Red Hat's kernel-utils-2.4-8.34 package.
Comment 12 Felipe Alfaro Solana 2003-08-24 04:43:57 UTC
Created attachment 704 [details] output of "dmesg" for 2.6.0-test4-bk1 booted with "pci=noacpi"
Comment 13 Felipe Alfaro Solana 2003-08-24 04:44:34 UTC
Created attachment 705 [details] output of "dmesg" for 2.6.0-test4-bk1 booted without "pci=noacpi"
Comment 14 ingo 2003-08-24 04:47:41 UTC
- linux-2.6.0-test3 was the last kernel that worked for me - linux-2.6.0-test4 is the first kernel that has/exposes the bug - linux-2.6.0-test4 hangs in i8042_check_mux() in request_irq() - linux-2.6.0-test4 with acpi=off works - linux-2.6.0-test4 with pci=noacpi works
Comment 15 ingo 2003-08-24 04:51:31 UTC
Created attachment 706 [details] firstname.lastname@example.org: output of dmidecode
Comment 16 ingo 2003-08-24 04:52:39 UTC
Created attachment 707 [details] email@example.com: output of acpidmp
Comment 17 Felipe Alfaro Solana 2003-08-24 07:02:52 UTC
OK, I've checked for a new BIOS and found one with date July, 7th 2003. This new BIOS supports IO-APIC and MPS 1.4 specification. I have flashed this new BIOS and I have found that 2.6.0-test4-bk1 boots fine if ACPI and APIC+IOAPIC are both enabled at the same time. Booting this new kernel with "noapic" causes the original hangs of this report when probing for the AUX port. Also, booting with "noapic pci=noacpi" also works. Thus, the system behaves exactly the same as with the old BIOS, but since this new BIOS includes APIC support, enabling APIC and IO-APIC also solves all problems. I have attached a new "dmesg" for 2.6.0-test4-bk1 with ACPI and APIC enabled, and booting with no extra kernel parameters. Also, I have added the output of running "lspci -vvv" and a copy of "/proc/interrupts" when the system is running with APIC and ACPI. Additionally, I have superseded my old "dmidecode" and "acpidmp" attachments with new ones.
Comment 18 Felipe Alfaro Solana 2003-08-24 07:05:43 UTC
Created attachment 710 [details] dmesg output of 2.6.0-test4-bk1 booted up with ACPI and APIC This is the dmesg output of 2.6.0-test4-bk1 running on a Platinix 2D/533-A motherboard with the latest BIOS, and booted with no extra kernel parameters, that is, ACPI is enabled and so is IO-APIC. I can't attach the dmesg output when booting with "noapic pci=noacpi" since the machine hangs while probing for the AUX port and, since I'm booting with no APIC support, the dmesg output differs considerably with respect to the dmesg of booting with APIC enabled.
Comment 19 Felipe Alfaro Solana 2003-08-24 07:06:16 UTC
Created attachment 711 [details] output of the "acpidmp" tool
Comment 20 Felipe Alfaro Solana 2003-08-24 07:07:15 UTC
Created attachment 712 [details] output of the "dmidecode" tool
Comment 21 Felipe Alfaro Solana 2003-08-24 07:07:51 UTC
Created attachment 713 [details] a copy of /proc/interrupts
Comment 22 Felipe Alfaro Solana 2003-08-24 07:08:13 UTC
Created attachment 714 [details] output of the "lspci -vvv" command
Comment 23 Adam J. Richter 2003-08-24 23:50:34 UTC
I am experiencing this problem also, on a p4 motherboard with a Via chipset. I have determined that when the system hangs, it is actually repeatedly calling i8042_interrupt() in an infinite loop. The first call to i8042_interrupt occurs when setup_irq in arch/i386/kernel/irq.c calls spin_unlock_irqrestore (i.e., the moment when interrupt processing by i8042_interrupt becomes available).
Comment 24 Adam J. Richter 2003-08-25 00:01:26 UTC
Created attachment 720 [details] output of dmidecode for freya.yggdrasil.com
Comment 25 Adam J. Richter 2003-08-25 00:02:52 UTC
Created attachment 721 [details] "lspci -vvv" output for freya.yggdrasil.com
Comment 26 Adam J. Richter 2003-08-25 00:04:13 UTC
Created attachment 722 [details] /proc/interrupts for freya.yggdrasil.com
Comment 27 Shaohua 2003-08-26 00:39:16 UTC
Any chance to get the boot message when fail to boot(through serial port)?
Comment 28 Adam J. Richter 2003-08-26 02:40:12 UTC
Created attachment 738 [details] serial console output for freya.yggdrasil.com The lines begining with "AJR" are from printk calls that I added. I have left them in, because they are pretty self-explanatory and provide some useful information. In particular, they show that the interrupt handler is being called in some kind of infinite loop. Perhaps there is some kind of interrupt acknowledgement problem or some kind of edge versus level misconfiguration.
Comment 29 Adam J. Richter 2003-08-26 23:35:01 UTC
Created attachment 745 [details] freya.yggdrasil.com console log from older kernel This is a log of the console output under 2.6.0-test3, which does not experience this problem. I believe someone asked for it in order to compare with the failing 2.6.0-test4 output.
Comment 30 Jun Nakajima 2003-08-27 08:03:18 UTC
We acknowledged this problem, and working on it.
Comment 31 Greg Kroah-Hartman 2003-08-28 10:13:30 UTC
*** Bug 1144 has been marked as a duplicate of this bug. ***
Comment 32 Luming Yu 2003-09-03 07:42:14 UTC
Please reference bug#10, it looks like a similar issue root-caused there
Comment 33 Adam J. Richter 2003-09-03 08:55:56 UTC
I tried the "Initial fix for GA-7VAX" given in bug #10, and my system still hung at the same point as before.
Comment 34 Luming Yu 2003-09-08 20:45:19 UTC
Would you please retry without Plug and Play support built-in? Thanks a lot!
Comment 35 Luming Yu 2003-09-08 23:20:35 UTC
Also, I need you try UP kernel instead of SMP kernel.(And ACPI are fully enabled , debug option is openend!) And please remove your printk. Thanks a lot.
Comment 36 Zach Gelnett 2003-09-12 06:41:19 UTC
I just tried out test5 and am getting the same result.
Comment 37 mrmailer 2003-09-13 20:55:47 UTC
this continues in test5.
Comment 38 Adam J. Richter 2003-09-15 16:33:39 UTC
I will not be able to test new kernels on the machine that experiences this problem until at least 2003.10.04. So, I would appreciate it if someone else who is experiencing this problem would try Luming's requests (comments 34 and 35), which were the following. From comment 34, retry without Plug and Play support (not sure if this means ISA PnP or some other configuration option). From comment 35, retry with: CONFIG_SMP not set, CONFIG_ACPI set, and debug option (?) enabled.
Comment 39 Luming Yu 2003-09-17 00:52:28 UTC
Would you please have patch at bug 1186 a try? thanks a lot.
Comment 40 Felipe Alfaro Solana 2003-09-17 05:20:57 UTC
The patch at bug 1186 (http://bugme.osdl.org/attachment.cgi?id=903&action=view) fixes the problem for me.
Comment 41 Petr Sebor 2003-09-22 16:34:07 UTC
Confirmed, this patch makes the problem go away... 2.6.0-test5-bk9, UP, ACPI, VIA KT400
Comment 42 ingo 2003-09-25 03:08:13 UTC
yep, the <a href=http://bugme.osdl.org/attachment.cgi?id=903&action=view>patch at bug 1186</a> works for me (<a href=http://bugzilla.kernel.org/show_bug.cgi?id=1123#c14>comment 14</a>) too.
Comment 43 ingo 2003-09-29 07:39:22 UTC
with plain 2.6.0-test6 the boots and AFAICS everything about works ok, but there is a 'irq 12: nobody cared!' message exactly where the hang used to be. dmesg output follows
Comment 44 ingo 2003-09-29 07:40:30 UTC
Created attachment 958 [details] dmesg output for 2.6.0-test6
Comment 45 Luming Yu 2003-09-29 18:33:26 UTC
Did you try that patch which fixed your problem of -test5? Did such kind of error message appear on -test5?
Comment 46 ingo 2003-09-30 02:17:48 UTC
just checked, the patch from bug 1186 makes the system boot without error message for test6, as was the case for test5.
Comment 47 Adam J. Richter 2003-10-20 07:55:11 UTC
The machine that had this problem with 2.6.0-test4 does not have this problem with 2.6.0-test8. 2.6.0-test8 does not have the patch that Ingo pointed to from bug 1186 ( http://bugme.osdl.org/attachment.cgi?id=903&action=view ), but drivers/acpi/pci_link.c has changed somewhat between 2.6.0-test4 and 2.6.0-test8. Does anyone still have this problem as of 2.6.0-test8?
Comment 48 Len Brown 2003-11-12 20:35:25 UTC
please re-open if you still have ps/2 IRQ problems with the latest 2.4 or 2.6 kernel. thanks, -Len