Distribution: Gentoo Hardware Environment: nForce2 chipset + SiI3112A onboard SATA controller Problem Description: When using APIC, file transfers hangs the system if DMA modes are enabled. When disabling DMA with hdparm, the system is stable. With APIC not compiled in (ACPI still enabled), the system is rock solid even with DMA on. This problem happens with both the onboard PATA (nForce2) and onboard SATA (SiI3112A). Other people on the nforcershq.com forums confirm this. Steps to reproduce: Enable ACPI, APIC, DMA transfer and just use the system for a few minutes. Running hdparm -t or bonnie a few times or just `cat /dev/hde > / dev/null` hangs the system in a few seconds. If more information is needed, just ask.
Please create attachements for acpidmp, dmesg at boot-time, lspci - vv, /proc/interrupts, serial console output at kernel hang. Thanks a lot!
Created attachment 1491 [details] acpidmp output of nforce2 a7n8x-deluxe This is the requested info for the original bug. I did not file the original but am seeing the same problems so I am attaching this to it with hopes to help fixing this problem.
Created attachment 1492 [details] dmesg output with noapic nolapic passed to kernel at boot
Created attachment 1493 [details] dmesg output with apic and lapic enabled
Created attachment 1494 [details] /proc/interrupts with noapic nolapic boot parameters
Created attachment 1495 [details] /proc/interrupts with apic enabled
Created attachment 1496 [details] lspci -vv lspci -vv of Asus A7n8x deluxe nforce2 motherboard. Anything else I could do to possibly get this bug closed I would be more than happy to do.
This bug appears on Abit Nforce2 mainboard, too. (Kernel 2.6.0 release) It was really hard to trace this one, please put up at least a warning message for nforce2 users. I got the system to run stable with "noapic nolapic", "noapic" alone was not enough. (The kernel-parameters documentation should be fixed, as I read "moapic - Tells the kernel not to make use of *any* APIC that may be present on the system." as "even not if there is a local apic found.") If you need any additional information, let me know!
Experiencing hangs with APIC/IOAPIC enabled in kernel mm-2.6.1 on nforce2 (Asus A7n8x delux). I have noticed the problem particular when playing streaming audio as well as large disk transfers. LMK if additional acpi/interrupt/lspci information is needed I will post.
I had this problem as well on a shuttle AN35N, until a BIOS dated Dec. 5th came out. I haven't been able to reproduce a hang with C1 disconnect with on, off, or auto, and with APIC. It seems to be fixed on this shuttle board.
I can also confirm that this is a problem with the 2.6.3 kernel. I have an ASUS A7N8X Deluxe with the nForce2 chipset, using IDE hard disk drives. The problem can also be reproduced in kernel versions: 2.6.3-mm1, 2.6.2, 2.4.24, 2.4.25-pre6 through to 2.4.25 (final). To regain a stable system, I have to also compile without APIC support (but leaving APCI). I also pass noapic and nolapic to the kernel on boot - the system is fine (even with "hdparm -d1 -c3 /dev/hdX"). I can provide more information/output if needed - just ask.
Same thing with 2.6.4 on a nforce2 ultra400 Abit NF7-Sv2.0 with latest bios. Slackware 9.1, PATA. With DMA and LAPIC and IOAPIC it hangs with higher disk activity. XMMS doesn`t hang it, no problem, reading linux documentation in text mode either. Copying something like a directory tree hangs it in a couple of minutes. Loading opera naturally too. With DMA disabled it`s rock solid, no hangs. Oh btw when it hangs (w DMA enabled I mean) not even MagicSysRq works, no response, just nothing. Please get this bug fixed, it`s really a pain in the ass for me. Thanks in advance guys.
I am sure that this isn't only IDE related. I have only SCSI disks attached to an Adaptec AHA2940U2W on an Abit NF7-S 2.0. With APIC enabled the system hangs sometimes even before getting to the login prompt. I tried kernels 2.4.23 to 2.4.25 and 2.6.0 to 2.6.4. A very interesting point is that it doesn't hang with APIC enabled and the processor clocked to 1 GHz instead of 2.2 GHz (tested some time ago with 2.4.23 and 2.6.0).
I have tested v2.4.26-rc1 and the problem still exists--with the NForce2 NVidia chipset, the system will hang (hard reset is the only way out) when one does DMA to the disk. This is with two different motherboards--the Shuttle FN45, and a BioStar M7NCG. I have not done any testing with 2.6 series kernels. steve@cs.ualberta.ca
In APIC mode, the timer interrupt is being set up incorrectly on nforce2 boards, as can be seen by the XT-PIC on IRQ0 in /proc/interrupts: CPU0 0: 91405 XT-PIC timer 1: 137 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 14: 54 IO-APIC-edge ide0 15: 6681 IO-APIC-edge ide1 19: 4097 IO-APIC-level nvidia 20: 0 IO-APIC-level ohci_hcd 21: 0 IO-APIC-level ehci_hcd, NVidia nForce2 22: 210 IO-APIC-level ohci_hcd Does Ross Dickson's patch should address this. http://marc.theaimsgroup.com/?l=linux-kernel&m=108181916319469&w=2 Does addressing the timer also address the hang, or is that still there? thanks, -Len
Created attachment 2574 [details] C1halt fix for APIC on Nforce2 boards This fix allows me to now run APIC on my Nforce2 board without lockups. I used to get a hard lockup within minutes if I ran a kernel with APIC. Even with this kernel, I can get a hard lockup if I don't include the "idle=C1halt" kernel command line parameter. This patch is from Ross Dickson. I am using a base kernel of 2.6.5-mc4. Original kernel message with patch: http://www.ussg.iu.edu/hypermail/linux/kernel/0402.1/0810.html More recent traffic on same: http://www.ussg.iu.edu/hypermail/linux/kernel/0404.0/1613.html From my system: hikaru root # uname -a Linux hikaru 2.6.5-mc4-as102 #1 Sun Apr 11 15:09:09 EST 2004 i686 AMD Athlon(tm) XP AuthenticAMD GNU/Linux hikaru root # cat /proc/cmdline BOOT_IMAGE=Tux-as102 ro root=306 idle=C1halt reboot=warm elevator=cfq pci=noacpi gentoo=nodevfs hikaru root # cat /proc/interrupts CPU0 0: 6108674 XT-PIC timer 1: 7893 XT-PIC i8042 2: 0 XT-PIC cascade 5: 81252 XT-PIC ohci_hcd 8: 2 XT-PIC rtc 9: 0 XT-PIC acpi 10: 0 XT-PIC ehci_hcd 11: 479542 XT-PIC ohci1394, eth0, nvidia 12: 0 XT-PIC ohci_hcd, NVidia nForce2 14: 33698 XT-PIC ide0 15: 19810 XT-PIC ide1 NMI: 0 LOC: 6108610 ERR: 2048498
The system in this example is booted with pci=noacpi and running in XT-PIC mode. Such a system would work even without the idle patch, yes? does simply booting with "idle=poll" with the vanilla kernel work? (though it may consume more power at idle)
This kernel does not work if booted without an "idle=", or booted with "idle=halt". This kernel does work if booted with "idle=C1halt" or "idle=poll". I will rebuild this kernel without the C1halt patch to complete the test.
Created attachment 2575 [details] "acpi_skip_timer_override" 2.6.5 patch for APIC mode timer If you boot in ACPI+APIC mode and still have XT-PIC timer, then please apply this patch and boot with "acpi_skip_timer_override" It will work-around the BIOS bug that erroneously maps the timer IRQ0 to pin2, and instead will take it from pin 0 and give you IO-APIC-edge on the timer. This will not address the hang, but may be required for proper timer operation in ACPI+APIC mode.
It would be _very_ interesting to know if a vanilla 2.6.5 kernel (with "acpi_skip_timer_override" applied & enabled, if needed) booted in ACPI+APIC mode does not hang with "idle=poll" If so, then that proves that the C1idle folks on LKML are on the right track, and dealing with HALT is necessary and sufficient to address the hang on this hardware.
A quick note: My kernel includes APIC, but not IOAPIC. hikaru root # fgrep APIC /boot/config-2.6.5-mc4-as102 CONFIG_X86_GOOD_APIC=y CONFIG_X86_UP_APIC=y # CONFIG_X86_UP_IOAPIC is not set CONFIG_X86_LOCAL_APIC=y
Yes, please add IOAPIC. I guess I should ahve said ACPI+LAPIC+IOAPIC w/ "idle=poll"
It looks like Len is on the right track. I've tested his suggestion with Debian's kernel 2.6.5 with "idle=poll" (timer patch was applied but not enabled). Here's my config: # grep APIC /usr/src/linux/.config CONFIG_X86_GOOD_APIC=y CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y Here are my interrupts: # cat /proc/interrupts CPU0 0: 1705044 XT-PIC timer 1: 3559 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 4 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 38974 IO-APIC-edge i8042 14: 598396 IO-APIC-edge ide0 15: 67 IO-APIC-edge ide1 19: 126318 IO-APIC-level radeon@pci:0000:03:00.0 20: 30 IO-APIC-level ohci_hcd 21: 50506 IO-APIC-level NVidia nForce2, eth0 22: 0 IO-APIC-level ohci_hcd NMI: 0 LOC: 1704893 ERR: 0 MIS: 0 My crash test is to run 100 "fsck -f /dev/hda5" on a filesystem containing about 1GB. With hda set at udma5 and without "idle=poll" , my machine (a7n8x deluxe) hangs during the first fsck. In fact I have never been able to finish a fsck on a 2.6.x kernel. Now the test has run the 100 fsck without hanging. You know what ? I'm happy 8-) Thanks a bunch
I've been following this for a while now and am very happy to see some development with this (tricky) bug. I've been doing some tests - here are my results: 2.6.5 - acpi_skip_timer_override patch (not enabled) w/ idle=poll =================================================================== oppressed:~# cat /proc/interrupts CPU0 0: 263882 XT-PIC timer 1: 943 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 4 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 3242 IO-APIC-edge i8042 14: 225547 IO-APIC-edge ide0 15: 36 IO-APIC-edge ide1 17: 321 IO-APIC-level eth0 18: 0 IO-APIC-level EMU10K1 19: 31155 IO-APIC-level nvidia, nvidia 20: 0 IO-APIC-level ohci_hcd 21: 0 IO-APIC-level ehci_hcd 22: 145 IO-APIC-level ohci_hcd NMI: 0 LOC: 263709 ERR: 0 MIS: 0 oppressed:~# cat /proc/cmdline BOOT_IMAGE=Debian ro root=342 idle=poll Linux 2.6.5 - acpi_skip_timer_override patch (enabled) w/ idle=poll =================================================================== oppressed:~# cat /proc/interrupts CPU0 0: 80590 IO-APIC-edge timer 1: 113 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 4 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 501 IO-APIC-edge i8042 14: 150647 IO-APIC-edge ide0 15: 37 IO-APIC-edge ide1 17: 185 IO-APIC-level eth0 18: 0 IO-APIC-level EMU10K1 19: 4346 IO-APIC-level nvidia, nvidia 20: 0 IO-APIC-level ohci_hcd 21: 0 IO-APIC-level ehci_hcd 22: 140 IO-APIC-level ohci_hcd NMI: 0 LOC: 80005 ERR: 0 MIS: 0 oppressed:~# cat /proc/cmdline BOOT_IMAGE=Debian ro root=342 acpi_skip_timer_override idle=poll Linux Both of the above result in a hard lock up, as per usual. I still need to boot with "noapic nolapic" in order to get a stable system. david@oppressed:/usr/src/linux$ grep APIC .config CONFIG_X86_GOOD_APIC=y CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y ACPI is also enabled. I'm more than happy to try other things and supply more information if needed (like my dot config file). I have not tried the C1halt fix as of yet.
I tried to apply the C1halt patch but it complains the patch file is malformed at line 15. I don't understand fully how APIC, ACPI etc. work (I'm not a hardware person) but when I booted with "nolapic" in addition to "acpi_skip_timer_override idle=poll", my system didn't fully boot - I saw complaints about IRQ settings related to USB and the NIC also didn't function correctly (didn't pick up DHCP). When my graphical login manager started, I only got a black screen and was unable to switch to a console - the machine had not frozen however, as the caps lock light etc. still responded. The above may be of no or little help with this bug but thought I would mention it just in case.
Created attachment 2637 [details] rediff of C1halt patch for 2.6.5 Here is a rediff of the C1halt patch, confirmed to apply to a stock 2.6.5 kernel.
Thanks - the C1halt patch applied fine on a 2.6.5 kernel, with acpi_skip_timer_override patch applied afterwards (to set timer to use APIC instead of XT-PIC) Booting with "acpi_skip_timer_override idle=C1halt" doesn't seem to resolve the issue but something different happens - Instead of getting a hard lockup, the machine will instantly reboot itself. I can always reproduce this when uncompressing VMware for example. I've uncompressed the same archive on an smbfs mount and it doesn't reboot, just to ensure it wasn't another bug somewhere along the line. What other testing and combinations can I perform?
I've got a Shuttle SN41G2 with a FN41 MB and a AMD Athlon(tm) XP 2200+ with BIOS 1/14/2004. The BIOS has an option to enable C1 disconnect, which I've enabled. But I've been unable to provoke a system hang. I'm running 2.6.5 + the timer workaround, ACPI enabled, APIC enabled.
found the entire BIOS history of the FN41 on ftp.shuttle.com reverted system to the initial release FN41S00X dating back to 12/18/2002, but still no hang. re-compiled kernel with CONFIG_SMP=n, still but no hang. I've got an Athlon 2200+ in this box, dunno if the hang is specific to processor model.
My board hangs with an athlon XP 2600+ (I think it's a tuned down barton core). Falko Meyer in comment #13 indicates that the hangs does not happen if the processor is slowed down. So it's probably related to the processor's speed. May be we need more statistical data on faulty boards. What kind of board ? FSB speed ? processor type ? actual core ? processor speed ? What else ?? Do you think these data would help you ?
Dominique, I have an AMD XP 2600+ myself on an ASUS A7N8X board. I had to set the processor speed in the BIOS itself because the board does not autodetect the speed correctly otherwise. I shall note down what exactly I set in the BIOS when I next reboot and add another comment to this bug, for reference. All, Also, I have had a development. Unfortunatly, it seems with previous crashes from the bug, my ReiserFS filesystem was getting trashed and this itself was causing random hangs and reboots. After a reiserfsck and rebuilding the reiser tree on my /home this seems to have helped the problem. I am now running 2.6.5 kernel, with acpi_skip_timer_override patch and the rediff C1halt patch for 2.6.5. oppressed:~# cat /proc/interrupts CPU0 0: 3986019 IO-APIC-edge timer 1: 6967 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 4 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 72442 IO-APIC-edge i8042 14: 57388 IO-APIC-edge ide0 15: 5765 IO-APIC-edge ide1 17: 19012 IO-APIC-level eth0 18: 63 IO-APIC-level EMU10K1 19: 556775 IO-APIC-level nvidia, nvidia 20: 0 IO-APIC-level ohci_hcd 21: 0 IO-APIC-level ehci_hcd 22: 285 IO-APIC-level ohci_hcd NMI: 0 LOC: 3958283 ERR: 0 MIS: 0 oppressed:~# oppressed:~# cat /proc/cmdline BOOT_IMAGE=Debian ro root=342 acpi_skip_timer_override idle=C1halt Linux I can now confirm that since fixing my filesystem and running the above I have not had any hard freezes, kernel panics, reboots etc for around 1 week. I am also running "hdparm -d1 -c3 /dev/hdb" on startup. Len, Can you update the "Kernel version" this bug applies to (if it's needed)?
nvidia has acknowledged a hardware cause for this hang, remarkably similar to what Ross had concluded: http://lkml.org/lkml/2004/5/3/157 and there is a workaround proposed for it here: http://lkml.org/lkml/2004/5/3/168 I'm moving this bug out of the ACPI category. The timer issue didn't turn out to be related to the cause of this hang, but a workaround for that is already in the 2.6 pipeline.
I'd just like to make it known that as of kernel 2.6.6 all of the problems described here have been fixed (for me at least). All interrupts are now correctly using APIC (thanks Len) and I don't get any hard freezes (C1 disconnect problem fixed by nVidia releasing the information) Good work! A job very well done. I'm very happy with things thus far. CPU temperature has dropped by ~8 degrees C too :) Thanks.
trying the 2.6.6 kernel with apic/io-apic and boot option "acpi_skip_timer_override" to make the timer irq work in io-apic mode, i get some "rtc: lost some interrupts at 1024Hz." log messages when playing videos in mplayer. without "acpi_skip_timer_override" the timer works in xt-pic mode and there are no lost interrupts. this is with an nforce2 asus a7n8x-dlx 1.04, bios 1007 (latest) & fedora2 test3, 2.6.6 kernel Thanks!
Vanilla 2.6.6 still did not fix the lockups on my system (Epox 8RDA3+, Athlon 2700+)... I get the nForce2 APIC fixup message during boot and /proc/interrupts looks fine (no XT-PIC hanging around). Booting with noapic nolapic and acpi=off seems to make things much more stable, but I haven't done a long running test to verify that yet. I Think 2.6.5 was broken as well, but 2.6.3 and/or 2.6.4 was fine as far as I recall. In particular IO-APIC on this epox motherboard only started working around 2.6.3 (I believe it is in the kernel Changelog), before that it was unstable and locked up. I thought 2.6.6 would fix this problem for me, since everyone else was reporting good news... but alas I am out of luck. Is there some boot parameters I should try, like idle=C1halt or something?
I am currently running 2.6.7-rc1-mm1 with no additional patches and my NForce2 chipset and Nvidia 5700 based video card are working fine with APIC and IOAPIC turned on and Nvidia binary video drivers. hikaru root # cat /proc/cmdline BOOT_IMAGE=Tux-as149,as ro root=306 reboot=warm gentoo=nodevfs hikaru root # cat /proc/interrupts CPU0 0: 44471250 IO-APIC-edge timer 1: 6590 IO-APIC-edge i8042 8: 2 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 14: 41699 IO-APIC-edge ide0 15: 22120 IO-APIC-edge ide1 19: 3549114 IO-APIC-level eth0, nvidia 20: 3620 IO-APIC-level ehci_hcd, NVidia nForce2 21: 41 IO-APIC-level ohci_hcd 22: 127575 IO-APIC-level ohci1394, ohci_hcd NMI: 0 LOC: 44370416 ERR: 0 MIS: 0
Created attachment 3049 [details] nForce2 fixup.c fixup Finally... I found the problem with my nForce2 system... the patch that is in 2.6.6 is not generic enough to work on all systems. Specifically the C1halt sets the config dword to either 0x9f01ff01 or 0x1f01ff01 assuming that it was either 0x9f0fff01 or 0x1f0fff01 to begin with... however in my case it was actually 0x8f0fff01 to start and changing it to 0x9f0fff01 still caused hard locks on my system! I made a patch against 2.6.6 that fixes the issue on my system and should be generic enough to work on all systems. Flipping that 0xF nibble to 0x1 seems more correct to me, not to mention that my system doesn't hang any longer (IO-APIC, APIC, ACPI, PREEMPT, Kernel 2.6.6).
"acpi_skip_timer_override" patch needed in 2.4.x kernels also. I have Abit NF7- S Rev2 with latest BIOS but I still obtain an XT-PIC timer with an ACPI-enabled kernel. I do not wish to switch to a 2.6 kernel at this time.
Just to expand on this, my kernel is, of course, APIC-enabled, as well as ACPI-enabled. It is 2.4.26. I get an XT-PIC timer. I don't know whether this is related but I see clock gain of about one minute per week. I'd just like to know whether the above 2.6.5 acpi_skip_timer_override patch can be applied to 2.4.x kernels (obviously with offsets) and whether there are any plans to include it in mainstream 2.4 kernels. Or should I use Ross Dickson's patch here: http://marc.theaimsgroup.com/?l=linux-kernel&m=108181916319469&w=2 ? There has been a lot of discussion of this and related issues but almost all public discussion has related to patching 2.6 kernels, with the exception of the C1 disconnect issue (resolved in 2.4.27) and one or two unofficial patch threads (see above link). Please reply here - I think my subscribed e-mail address has problems.
Has this bug been resolved with current stable kernel?
As a direct follow on from my previous post, everything was fine until one day I upgraded my BIOS and it stopped working and a line begining with "Abit NF7-S" (my motherboard) disappeared from dmesg. Since then i've had to include the attached patch to keep my timer as IO-APIC-edge. The patch inserts a new BIOS entry into a table in arch/i386/kernel/dmi_scan.c. ## hikaru patches # cat /proc/interrupts CPU0 0: 78797244 IO-APIC-edge timer 1: 2516 IO-APIC-edge i8042 8: 2 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 14: 27465 IO-APIC-edge ide0 15: 28246 IO-APIC-edge ide1 19: 5955211 IO-APIC-level nvidia 20: 2994 IO-APIC-level ehci_hcd, NVidia nForce2 21: 3 IO-APIC-level ohci1394, ohci_hcd 22: 9539769 IO-APIC-level ohci_hcd, eth0 NMI: 0 LOC: 78400155 ERR: 0 MIS: 0 hikaru patches # fgrep APIC ~/k/config-2.6.10-rc2-bk7-as328 CONFIG_X86_GOOD_APIC=y CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y hikaru patches # fgrep ACPI ~/k/config-2.6.10-rc2-bk7-as328 # Power management options (ACPI, APM) # ACPI (Advanced Configuration and Power Interface) Support CONFIG_ACPI=y CONFIG_ACPI_BOOT=y CONFIG_ACPI_INTERPRETER=y # CONFIG_ACPI_SLEEP is not set CONFIG_ACPI_AC=y CONFIG_ACPI_BATTERY=y CONFIG_ACPI_BUTTON=y CONFIG_ACPI_VIDEO=y CONFIG_ACPI_FAN=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_THERMAL=y # CONFIG_ACPI_ASUS is not set # CONFIG_ACPI_IBM is not set # CONFIG_ACPI_TOSHIBA is not set CONFIG_ACPI_BLACKLIST_YEAR=0 # CONFIG_ACPI_DEBUG is not set CONFIG_ACPI_BUS=y CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y CONFIG_ACPI_PCI=y CONFIG_ACPI_SYSTEM=y CONFIG_HOTPLUG_PCI_ACPI=m # CONFIG_HOTPLUG_PCI_ACPI_IBM is not set # CONFIG_SERIAL_8250_ACPI is not set
Created attachment 4183 [details] new BIOS entry in dmi_scan.c table # Patch for previous entry. (Oops.)
Ok that symptom is true of a lot of nforce2 bugs and it is indeed BIOS related, there is currently a DMI entry for your system but the dates are different; { ignore_timer_override, "Abit NF7-S v2", { MATCH(DMI_BOARD_VENDOR, "http://www.abit.com.tw/"), MATCH(DMI_BOARD_NAME, "NF7-S/NF7,NF7-V(nVidia-nForce2)"), MATCH(DMI_BIOS_VERSION, "6.00 PG"), MATCH(DMI_BIOS_DATE, "03/24/2004") }}, I'm going to close this bug with resolution CODE_FIX, we can submit a patch to take care of you if needed (please test latest -mm kernel). Thanks
Note that the dmi entries for ignore_timer_override are obsolete. They're handled now automatically by the pair of patches in bug #3551, based on PCI-id, which which are already in the -mm tree.
Simply to confirm that indeed 2.6.10-rc3-bk13 does not require the previous patch submitted by me.