Distribution: gentoo, but tryed with other too. Problem Description: When configured kernel with smp and ht, together with acpi then interrupt 0 stops counting after 5min - 1 hour. The strange is that when irq0 stops, local interrupt counter are still counting. If I don't load the ac module of ACPI the problem seems to not occure. Also system configured without smp seems to be stable. Steps to reproduce: run system with smp, load module ac, and wait.
Created attachment 4698 [details] lspci
Created attachment 4699 [details] dmesg
Created attachment 4700 [details] /proc/interrupts
Created attachment 4701 [details] dsdt ACPI table
I've seen something exactly like this, though "acpi=off" doesn't resolve it. The only workaround has been to use a uniprocessor kernel. I originally opened this bug in https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=161153 though I can confirm that 2.6.12-rc3 and 2.6.13-rc3 behave similarly. readprofile data is available there, but nothing is implicated. My system is as follows: Description of problem: Fedora Core 4, fresh install, though same seen after FC3->FC4 upgrade too. Dell PowerEdge 2400, 2x933MHz, 1GB RAM, built-in e100 network, several disks on onboard aic7890 controller, using LVM on one disk for boot, md raid 1 + lvm1 on two disks for /home. System initially is fine, but after a few minutes, system becomes sluggish. Gnome system monitor tool stops refreshing every second, becomes every few minutes. top hangs. Can no longer sudo or log in, but can start new gnome-terminals with Ctrl-T. Can no longer ping the ethernet device from outside. Switching from VT7 to VT1 succeeds, but cannot log in. SysRQ works there though. Nothing unusual on the task lists. SysRQ-M shows plenty of free memory, -P shows both CPUs in idle loop. Outgoing network connections occasionally OK, though mostly hung. Rebooting via sysrq-b works. Emergency sync claims to work, but the data appears not to be committed to journal or disk, as it's not present after reboot. If I edit files while in this state, those changes do not persist after reboot, even after sysrq-s. Timer interrupts have stopped counting. With 2.6.13-rc3, all interrupts appeared to be stopped, with all FC4-smp kernels only the disk and timer interrupts appeared to stop, while network interrupts continued to function. Tried with acpi=off, selinux=0, audit=0 in various combinations, no effect. Fedora Core 3 SMP kernel did not behave like this, was running for 3-4 months with no ill behavior. Likewise RHL9 kernels on the same system before that for several years. This is the strangest thing I've seen in a while. Version-Release number of selected component (if applicable): FC4 gold release SMP i686 kernel How reproducible: on every boot
Progress. Booting with 'clock=tsc' seems to work as a workaround on my system. This isn't root-cause, but is a start. Kernel 2.6.13-rc{2,4,6,7} tried, same behavior, same workaround or disable CONFIG_X86_PM_TIMER.
For the record, Miroslaw's system appears to be have an Intel ICH5 southbridge, while mine has a ServerWorks OSB4 southbridge.
Well, it *was* working fine for >24 hours with 'clock=tsc' on Fedora kernel 2.6.12-1.1398_FC4smp, then it hung same way again with timer interrupts stopped being received. So it's much better, but 'clock=tsc' doesn't completely solve it.
For me it works if AC of ACPI is not started (when I compile it as module, and dont load the module, or even don't compile it at all). Yesterday I tried with kernel parameter "noapicinterrupt", and it works (also with AC module). The system seems to be faster (I don't understad it).
Today I checke new kernel 2.6.14, and the problem still exist. Is there any possibility of checking of state of APIC? Maybe then it can be detected what is going on.
I tried with kernel 2.6.15 the problem exists. But I observed something, the latency settings under windows and linux are different: - for atheros card under linux there is latency timer 168 (windows 128); - for PCMCIA port under linux 168 (windows 64). I don't know, can these differences have any impact on this problem?
> CPU0 CPU1 > 0: 14752970 16 IO-APIC-edge timer > 1: 10 8660 IO-APIC-edge i8042 > 8: 2 0 IO-APIC-edge rtc > 9: 113734 0 IO-APIC-level acpi This is a lot of ACPI interrupts.... > For me it works if AC of ACPI is not started > Yesterday I tried with kernel parameter "noapicinterrupt", > and it works (also with AC module). > The system seems to be faster (I don't understad it). Matt, do you see this too?
I don't have any acpi interrupts. Wed Jan 18 10:34:59 CST 2006 CPU0 CPU1 0: 154069 236996 IO-APIC-edge timer 1: 43 69 IO-APIC-edge i8042 5: 1 0 IO-APIC-edge SoundBlaster 7: 0 0 IO-APIC-edge parport0 8: 1 0 IO-APIC-edge rtc 9: 0 0 IO-APIC-level acpi 11: 0 0 IO-APIC-level ohci_hcd:usb1 12: 4515 4503 IO-APIC-edge i8042 177: 5656 5947 IO-APIC-level ide2 185: 32811 32489 IO-APIC-level aic7xxx 193: 31144 31285 IO-APIC-level aic7xxx 201: 4700 4722 IO-APIC-level eth0 NMI: 632 561 LOC: 422387 422211 ERR: 0 MIS: 0 However, now it's hung, only eth0 and LOC interrupts go up. This is 2.6.16-rc1. It generally fails for me when I'm in X, running Firefox. I'll try noapicinterrupt
on i386, noapicinterrupt doesn't exist as an option, but it will get parsed as noapic. Is that what you intended?
I'm having the same problem here. Is there anything I can do to help get this problem fixed so I can have working ACPI on my machine? Post #5 pretty well sums up my symptoms, but I'm on (not entirely) different hardware: Gentoo 2.6.15-suspend2 and Ubuntu 2.6.12-10-686 (dual boot). IBM NetVista 8311-KWU, Celeron 2.2 GHz, 256 MB RAM. I'll attach my lspci -vv, as at first glance it looks suspiciously similar to the one already posted. It generally takes me a little longer to reproduce this problem, more on the order of half an hour to 5 hours, but I've never seen it stay working overnight. This is a desktop system, so at the moment I'm using a gentoo kernel without ACPI support built in. No ACPI means no problem, but also no features. I would like to get ACPI working so I can hibernate, have the computer automatically turn off, etc.
Created attachment 7274 [details] lspci -vv
I don't have this hardware anymore. So even if the problem would be fixed, I cannot check it anymore.
Geoff: I suspect your issue is a different issue that has been seen w/ IBM Netvistas and ThinkCentres. A BIOS update is usually the correct fix. Matt: Do you still see this issue w/ 2.6.18-rc4?
Unfortunately, I no longer have the problematic hardware to test with.
please re-open if this issue is reproducible.