Distribution: Debian unstable custom kernel Hardware Environment: Asus M2N motherboard (with latest bios) with a AMD Athlon 64 X2 5000+ Problem Description: I get a kernel panic when booting booted with apic=debug, the screen shows: enable extInt on Cpu#0 esr value before enabling vector 0x00000004 after 0x00000000 Booting processor 1/1 eip 2000 Initializing CPU#1 Calibrating delay using timer specific routine.. 5201.73 BogoMIPS (lpj=2600869) .... .... Total of 2 processors activated (10431.14 BogoMIPS). ENABLING IO-APIC IRQ ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1 ..MP BIOS bug 8254 timer not connected to io-apic ...trying to set up timer (irq0) through 8258A .. failed ...trying to set up timer as VirtualWire irq .. failed ...trying to set up timer as ExtInt Irq .. failed :( kernelpanic - not syncing: io-apic + timer doesn't work! Steps to reproduce: Enable ACPI APIC support in bios, disabling get's me a working system
Created attachment 12619 [details] dmesg when apic support is disabled (working system)
Created attachment 12620 [details] config file custom kernel
(In reply to comment #2) > Created an attachment (id=12620) [details] > config file custom kernel Will you please upload the full dmesg and acpidump data?
Created attachment 12681 [details] acpidump booted with noapic I allready attached the full dmesg, if that one is wrong or incompleet please give me some hints on getting a proper dmesg report
(In reply to comment #4) > Created an attachment (id=12681) [details] > acpidump booted with noapic > I allready attached the full dmesg, if that one is wrong or incompleet > please give me some hints on getting a proper dmesg report Thanks for the acpidump.But the dmesg is incomplete. Will you please get the full dmesg with boot option apic=debug ?
Created attachment 12696 [details] dmesg with cmdline: noapic apic=debug
Created attachment 12697 [details] dmesg with apic=debug, apic disabled in bios I uploaded both because there are some big differences between them Please for further debug stuff, tell me how I should boot: with noapic? or apic turned of in bios (or ofcourse both) Thanks
Created attachment 12698 [details] acpidump booted with apic turned of in bios Just noticed that acpidump differs aswell, hope it helps
none of the dmesg files go back to the start of boot. Please build with CONFIG_LOG_BUF_SHIFT=18 and please collect the dmesg using dmesg -s64000
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 0 dfl dfl) The BIOS has a NO-OP interrupt source override on irq0 where the timer is supposed to be. If you got to BIOS SETUP and choose the system defaults, does it enable the IOAPIC, or disable it? Please attach the output from lspci and the output from dmidecode. This appears to be a chipset, BIOS, or motherboard bug, and we may be able to work around it. Also, please boot with "acpi=off" when the BIOS has IOAPIC enabled and report if the system is able to get into IOAPIC mode, or if it boots in PIC mode instead (cat /proc/interrupts)
(In reply to comment #8) The upload dmesg is incomplete. Please upload the full dmesg. Compared with the comment #4, the acpidump of #8 has no "APIC" table. And APIC table contains the info of the connection between 8254 timer and I/O APIC. In comment #4 the connect info is uncorrect. So it will report the BIOS bug. In case that acpi is disable in bios the system won't report the message that 8254 timer can't connect to I/O APIC because there is no apic table.
Created attachment 12738 [details] dmesg with cmdline: noapic apic=debug
Created attachment 12739 [details] dmidecode dump
Created attachment 12740 [details] verbose lspci dump
Created attachment 12741 [details] /proc/interrupts when acpi is off the bios defaults turn IOAPIC on (on a side note, without IOAPIC winXP won't find my dual core proc only one core will work) booting with acpi=off works (with IOAPIC on), however I do get a "irq 11: nobody cared" and I got no network. Other than that it looks like it does goes into IOAPIC mode /proc/interrupts still shows a XT-PIC-XT entry tho
Created attachment 12742 [details] /proc/interrupts with noapic
Created attachment 12743 [details] dmesg with apic=debug acpi=off
(In reply to comment #17) > Created an attachment (id=12743) [details] > dmesg with apic=debug acpi=off Thanks for the info. Will you please retry it with boot option acpi=off and noapic ? From the comment #15 and #16, we can know that there are too many interrupts about EMU10K1 and it is abnormal.
Created attachment 12766 [details] /proc/interrupts with noapic and acpi off /proc/interrupts now has far less interrupts
Created attachment 12767 [details] dmesg with noapic acpi=off
(In reply to comment #19) > Created an attachment (id=12766) [details] > /proc/interrupts with noapic and acpi off > > /proc/interrupts now has far less interrupts > Can the sytem work well when using the boot option of noapic acpi=off? please attach the output of lspci with boot option of noapic acpi=off? Thanks.
Created attachment 12777 [details] verbose lspci dump (noapci with acpi=off) System seems to be working (audio, net, nvidia all work)
Hi, Arjan Thanks for the info. Will you please check whether debug function of PCI and ACPI is enabled in the kernel configuration? With the boot option of acpi=off apic=debug debug initcall_debug , upload the dmesg . Please try it with boot option of apic=debug debug initcall_debug( acpi is enabled) and upload some info. Thanks.
Created attachment 12836 [details] dmesg with cmdline: acpi=off apic=debug debug initcall_debug
Created attachment 12837 [details] lspci dump with acpi=off apic=debug debug initcall_debug The lspci shows changes compared too the last lspci dump
Created attachment 12838 [details] lspci dump with noapic apic=debug debug initcall_debug Again, some small changes compared to the other lspci
Created attachment 12839 [details] dm
Created attachment 12840 [details] dmesg with cmdline: noapic apic=debug debug initcall_debug Hi, From comment #23 > Please try it with boot option of apic=debug debug initcall_debug( acpi is > enabled) This gives me a kernel panic as before Instead I uploaded a dmesg with noapic apic=debug debug initcall_debug, hope it helps
(In reply to comment #28) I forget to mention, I have ACPI debug and PCI debug enabled now (CONFIG_PCI_DEBUG and CONFIG_DEBUG_KERNEL where not enabled before)
Created attachment 12943 [details] Get some info about MPS table hi, Arjan Will you please get some info using the test patch? Try to boot with the option of acpi=off apic=debug after using the patch. After booting the system , please upload the following info. a. dmesg (complete dmesg) b. lspci -xxx Thanks
Created attachment 12955 [details] dmesg with the MPS patch (acpi=off, apic=debug)
Created attachment 12956 [details] lspci -xxx Hi Yakui, Hope it's usefull info
Hi, Arjan Thanks for the info. Ihe info is very useful. Will you please try to boot the system with boot option of hpet=disable apic=debug debug ? Thanks.
Created attachment 12982 [details] dmesg with noapic hpet=disable apic=debug debug Hi Yakui, The system does not boot with hpet=disable apic=debug debug (same kernelpanic) Will upload another dmesg with acpi=off hpet=disable apic=debug debug
Created attachment 12983 [details] dmesg with acpi=off hpet=disable apic=debug debug
Hi, Arjan Thanks. Can you check whether there is serial port on your machine ? If exists,please get the boot message through serial port ? (kernelpanic message). It is very easy to get boot message through serial port. The boot option for serial port is "console=ttyS0,115200n8".
Created attachment 12985 [details] workaround patch about I/O APIC+timer please add the boot option of acpi_use_timer_override hpet=disable apic=debug after using the patch. Thanks.
Several bugs report that I/O apic and timer can't be connected in the chipset of NVIDIA. Maybe this is caused by the bug of Nvidia's chipset. In theroy there are two pathes about the connection between timer and CPU. a. connected to the pin 2 of I/O APIC. b. connected to the pin 0 of I8259 and the output INTR pin of i8259 is connected to the Pin 0 of I/O APIC and LINT0 of CPU local APIC. In the initialization of the system the flowchart about the timer is listed in the following: 1. Initialize the I8259 interrupt controller( the pic mode is used before I/O APIC is enabled). 2. initialize the timer in the function of time_init and timer begins to work. 3. get the bogomips using the timer interrupt. 4. change the timer route from i8259 to I/O APIC and check whether the timer still works if I/O APIC is used. If it can't work , the timer router will be restored to i8259. On the chipset of Nvidia there are several phenomenas. 1. acpi=off, timer is connected to the pin 2 of I/O APIC and can work well. The route between timer and I/O APIC is defined in the MPS table and it is correct. 2. noapic . timer is connected to the pin 0 of i8259 can work well. 3. acpi=on, timer can work before I/O APIC is enabled. After the timer route is changed to I/O APIC, the timer can't work. But when timer route is restored to I8259, the timer still can't work. This is abnormal. On the chipset of nvidia the route between timer and I/O apci isn't defined correctly in the APIC table. But it is strange that timer can't work after the timer route is restored to the I8259. Maybe there are two possible explanations: a. the I8259 can't work after I/O APIC is enabled. b. the connection between INTR of i8259 can LINT0 of CPU is cut off after I/O APIC is enabled. Because we can't get the specification of Nvidia chipset, we can't give the reasonable explanations about this.
Hi Yakui, Sorry for the late reply but when I finally got my hands on a null modem cabel I noticed that I didn't have a com port, at least no bracket So when I finally found a serial bracket, things still didn't work as I got the pin layout wrong (so some soldering is required on the bracket) So I don't have debug messages yet I can tell that the patch from comment #37 is not working? the same kernel panic shows up again Thanks for the help so far
Created attachment 13034 [details] dmesg with workaround patch cmd acpi_use_timer_override hpet=disable apic=debug debug Ok I got my serial port working and my last reply was incorrect Applying the patch gives me a booting system (I my a typo with the patch) I will also upload the dmesg captured with the serial port Do note that I moved my audio card one slot up to make place for the serial port so lspci gives a differnt ouput now
Created attachment 13035 [details] lspci -vvv
Created attachment 13036 [details] lspci -xxx
Created attachment 13037 [details] /proc/interrupts with acpi_use_timer_override
Created attachment 13038 [details] dmesg with apic=debug debug
Created attachment 13039 [details] dmesg with hpet=disable apic=debug debug
Hi, Arjan Thanks for the info. From the comment #40 it seems that the timer and I/O APIC can work well after using the workaround patch. So the error in the comment #45 is caused by the fault connection between timer and I/O apic.
Hi, Arjan Have you tried to boot the system with boot option of acpi_use_timer_override after using the workaround patch?(comment #37. Not add hpet=disable). Thanks.
Created attachment 13153 [details] dmesg with workaround patch cmd acpi_use_timer_override apic=debug debug Hi, The system seems to work fine with hpet enabled (with the workaround patch)
Created attachment 13158 [details] Using the workaround patch Hi, Arjan Thanks for the info. It seems that the system can work well after using the workaround patch. And this bug is caused by the BIOS bug. The patch of fixing the timer override in NVIDIA system will soon be avaiable. Now there is another problem. When booted with acpi=off , the system reports that nobody cares IRQ 11. Will you please test the workaround patch? After applying the workaround patch, boot system with option of acpi=off apic=debug initcall_debug and upload the following two files: a. dmesg b. more /proc/interrupts. Thanks.
Created attachment 13168 [details] dmesg with the pci_irq_workaround patch cmd acpi=off apic=debug initcall_debug Hi Yakui, I used the attached patch from comment #49 But I still get a irq nobody cared Will upload the dmesg and /proc/interrupts Thanks
Created attachment 13169 [details] /proc/interrupts with the pci_irq_workaround patch
(In reply to comment #51) > Created an attachment (id=13169) [details] > /proc/interrupts with the pci_irq_workaround patch > Forget to mention that atleast network does not work
Thanks for the test. The workaround patch can't solve the bug but it is helpful to narrow down the scope of the bug that nobody cares the IRQ11.
Hi, Arjan Thanks for your info. Now there is another problem about your system. When booted with acpi=off, the system reports that nobody cares IRQ 11. From the comment of #50, #19 and #20, it is doubted that the problem is caused by the BIOS bug( Ethernet PCI interrupt routing). 1. When acpi is enabled, ethernet PCI interrupt is routed to Pin 23 of I/O APIC through LINK device(LSA0) and IRQ number is 16. It seems that it can work well. 2. When booted with acpi=off noapic, ethernet PCI interrupt is routed to Pin 11 of I8259(defined directly by PCI device). It seems that eth0 can work well. 3. When booted with acpi=off, ethernet PCI interrupt is routed to Pin 15 of I/O APIC accordint to the definition in MPS table(defined by BIOS) and IRQ 15 is enabled. But the route is uncorrect and ethernet is still connected with Pin 11 of I/O APIC . After ethernet device is enabled, IRQ 15 can't receive the interrupt and IRQ 11 can't process the interrupts triggered by ethernet device. So the system will report that IRQ 11 is cared by nobody. Based on the above analysis the problem is caused by the Bug of BIOS. So we won't fix this problem. The patch about the timer override will be soon available. Thanks.
(In reply to comment #54) Hi Yakui, Thanks for all the help. I saw there was a new bios available but I still get the same errors So the timer override patch is the way to go Thanks again
Hi, Arjan Timer override is required in your system. Unfortunately the timer override provided by BIOS is uncorrect. So the sytem can't work well. But unfortunately the situation of your system is different with that of NF5, NF3 or NF4. It is more appropriate to modifiy timer override using explicit DMI entry. Will you please provide the dmidecode info? Thanks.
Created attachment 13345 [details] dmidecode (with the latest bios) Hi Yakui, dmidecode was already attached?, but here it is again with the latest bios
Hi, Arjan The timer is routed to the pin 2 of I/O APIC on the system. But unfortunately the info of timer override item provided by MADT is uncorrect. So the system can't work normally. At the same time the situation of the system is different with that of NF3, NF4 or NF5. The error is caused by the uncorrect ACPI table. I think that is is more reasonable to resolve this problem by BIOS update and we won't fix it. Thanks for the info.