Bug 4319 - Strange lockups of timer interrupt (irq 0)
Summary: Strange lockups of timer interrupt (irq 0)
Status: REJECTED UNREPRODUCIBLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Interrupts (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: acpi_config-interrupts
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-03-09 08:39 UTC by Miroslaw Mieszczak
Modified: 2006-09-11 14:48 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.xx
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
lspci (11.48 KB, text/plain)
2005-03-09 08:43 UTC, Miroslaw Mieszczak
Details
dmesg (18.08 KB, text/plain)
2005-03-09 08:44 UTC, Miroslaw Mieszczak
Details
/proc/interrupts (785 bytes, text/plain)
2005-03-09 08:45 UTC, Miroslaw Mieszczak
Details
dsdt ACPI table (16.44 KB, application/octet-stream)
2005-03-09 08:46 UTC, Miroslaw Mieszczak
Details
lspci -vv (6.95 KB, text/plain)
2006-02-08 22:29 UTC, Geoff Lywood
Details

Description Miroslaw Mieszczak 2005-03-09 08:39:35 UTC
Distribution: gentoo, but tryed with other too.

Problem Description:
When configured kernel with smp and ht, together with acpi then interrupt 0
stops counting after 5min - 1 hour. 
The strange is that when irq0 stops, local interrupt counter are still counting.
If I don't load the ac module of ACPI the problem seems to not occure.
Also system configured without smp seems to be stable.
Steps to reproduce:
run system with smp, load module ac, and wait.
Comment 1 Miroslaw Mieszczak 2005-03-09 08:43:52 UTC
Created attachment 4698 [details]
lspci
Comment 2 Miroslaw Mieszczak 2005-03-09 08:44:28 UTC
Created attachment 4699 [details]
dmesg
Comment 3 Miroslaw Mieszczak 2005-03-09 08:45:03 UTC
Created attachment 4700 [details]
/proc/interrupts
Comment 4 Miroslaw Mieszczak 2005-03-09 08:46:35 UTC
Created attachment 4701 [details]
dsdt ACPI table
Comment 5 Matt Domsch 2005-07-27 21:15:19 UTC
I've seen something exactly like this, though "acpi=off" doesn't resolve it. 
The only workaround has been to use a uniprocessor kernel.

I originally opened this bug in
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=161153
though I can confirm that 2.6.12-rc3 and 2.6.13-rc3 behave similarly. 
readprofile data is available there, but nothing is implicated.

My system is as follows:


Description of problem:
Fedora Core 4, fresh install, though same seen after FC3->FC4 upgrade too.
Dell PowerEdge 2400, 2x933MHz, 1GB RAM, built-in e100 network, several disks on
onboard aic7890 controller, using LVM on one disk for boot, md raid 1 + lvm1 on
two disks for /home.

System initially is fine, but after a few minutes, system becomes sluggish. 
Gnome system monitor tool stops refreshing every second, becomes every few
minutes.  top hangs.  Can no longer sudo or log in, but can start new
gnome-terminals with Ctrl-T.  Can no longer ping the ethernet device from outside.  

Switching from VT7 to VT1 succeeds, but cannot log in.  SysRQ works there
though.  Nothing unusual on the task lists.   SysRQ-M shows plenty of free
memory, -P shows both CPUs in idle loop.  Outgoing network connections
occasionally OK, though mostly hung.  Rebooting via sysrq-b works.  Emergency
sync claims to work, but the data appears not to be committed to journal or
disk, as it's not present after reboot.  If I edit files while in this state,
those changes do not persist after reboot, even after sysrq-s.

Timer interrupts have stopped counting.  With 2.6.13-rc3, all interrupts
appeared to be stopped, with all FC4-smp kernels only the disk and timer
interrupts appeared to stop, while network interrupts continued to function.

Tried with acpi=off, selinux=0, audit=0 in various combinations, no effect.

Fedora Core 3 SMP kernel did not behave like this, was running for 3-4 months
with no ill behavior. Likewise RHL9 kernels on the same system before that for
several years.

This is the strangest thing I've seen in a while.


Version-Release number of selected component (if applicable):
FC4 gold release SMP i686 kernel

How reproducible:
on every boot
Comment 6 Matt Domsch 2005-08-27 12:55:33 UTC
Progress.
Booting with 'clock=tsc' seems to work as a workaround on my system.  This isn't
root-cause, but is a start.  Kernel 2.6.13-rc{2,4,6,7} tried, same behavior,
same workaround or disable CONFIG_X86_PM_TIMER.
Comment 7 Matt Domsch 2005-08-28 06:23:40 UTC
For the record, Miroslaw's system appears to be have an Intel ICH5 southbridge,
while mine has a ServerWorks OSB4 southbridge.
Comment 8 Matt Domsch 2005-08-28 20:28:25 UTC
Well, it *was* working fine for >24 hours with 'clock=tsc' on Fedora kernel 
2.6.12-1.1398_FC4smp, then it hung same way again with timer interrupts 
stopped being received.  So it's much better, but 'clock=tsc' doesn't 
completely solve it.
Comment 9 Miroslaw Mieszczak 2005-08-28 21:43:25 UTC
For me it works if AC of ACPI is not started (when I compile it as module, and
dont load the module, or even don't compile it at all).

Yesterday I tried with kernel parameter "noapicinterrupt", and it works (also
with AC module). The system seems to be faster (I don't understad it).
Comment 10 Miroslaw Mieszczak 2005-10-29 02:10:59 UTC
Today I checke new kernel 2.6.14, and the problem still exist.
Is there any possibility of checking of state of APIC?
Maybe then it can be detected what is going on.
Comment 11 Miroslaw Mieszczak 2006-01-06 11:39:10 UTC
I tried with kernel 2.6.15 the problem exists.

But I observed something, the latency settings under windows and linux are
different:
- for atheros card under linux there is latency timer 168 (windows 128);
- for PCMCIA port under linux 168 (windows 64).

I don't know, can these differences have any impact on this problem?
Comment 12 Len Brown 2006-01-17 22:24:05 UTC
>           CPU0       CPU1       
>  0:   14752970         16    IO-APIC-edge  timer
>  1:         10       8660    IO-APIC-edge  i8042
>  8:          2          0    IO-APIC-edge  rtc
>  9:     113734          0   IO-APIC-level  acpi

This is a lot of ACPI interrupts....

> For me it works if AC of ACPI is not started 

> Yesterday I tried with kernel parameter "noapicinterrupt",
>  and it works (also with AC module).
>  The system seems to be faster (I don't understad it).

Matt, do you see this too?
Comment 13 Matt Domsch 2006-01-18 08:27:20 UTC
I don't have any acpi interrupts.
Wed Jan 18 10:34:59 CST 2006
           CPU0       CPU1
  0:     154069     236996    IO-APIC-edge  timer
  1:         43         69    IO-APIC-edge  i8042
  5:          1          0    IO-APIC-edge  SoundBlaster
  7:          0          0    IO-APIC-edge  parport0
  8:          1          0    IO-APIC-edge  rtc
  9:          0          0   IO-APIC-level  acpi
 11:          0          0   IO-APIC-level  ohci_hcd:usb1
 12:       4515       4503    IO-APIC-edge  i8042
177:       5656       5947   IO-APIC-level  ide2
185:      32811      32489   IO-APIC-level  aic7xxx
193:      31144      31285   IO-APIC-level  aic7xxx
201:       4700       4722   IO-APIC-level  eth0
NMI:        632        561
LOC:     422387     422211
ERR:          0
MIS:          0

However, now it's hung, only eth0 and LOC interrupts go up.  This is 2.6.16-rc1.
It generally fails for me when I'm in X, running Firefox.

I'll try noapicinterrupt
Comment 14 Matt Domsch 2006-01-18 11:05:45 UTC
on i386, noapicinterrupt doesn't exist as an option, but it will get parsed as
noapic.  Is that what you intended?
Comment 15 Geoff Lywood 2006-02-08 22:26:58 UTC
I'm having the same problem here. Is there anything I can do to help get this
problem fixed so I can have working ACPI on my machine?

Post #5 pretty well sums up my symptoms, but I'm on (not entirely) different
hardware:
Gentoo 2.6.15-suspend2 and Ubuntu 2.6.12-10-686 (dual boot).
IBM NetVista 8311-KWU, Celeron 2.2 GHz, 256 MB RAM. I'll attach my lspci -vv, as
at first glance it looks suspiciously similar to the one already posted.

It generally takes me a little longer to reproduce this problem, more on the
order of half an hour to 5 hours, but I've never seen it stay working overnight.

This is a desktop system, so at the moment I'm using a gentoo kernel without
ACPI support built in. No ACPI means no problem, but also no features. I would
like to get ACPI working so I can hibernate, have the computer automatically
turn off, etc.
Comment 16 Geoff Lywood 2006-02-08 22:29:14 UTC
Created attachment 7274 [details]
lspci -vv
Comment 17 Miroslaw Mieszczak 2006-05-08 09:15:41 UTC
I don't have this hardware anymore. 
So even if the problem would be fixed, I cannot check it anymore.
Comment 18 john stultz 2006-08-22 16:45:13 UTC
Geoff: I suspect your issue is a different issue that has been seen w/ IBM
Netvistas and ThinkCentres. A BIOS update is usually the correct fix.

Matt: Do you still see this issue w/ 2.6.18-rc4?
Comment 19 Matt Domsch 2006-08-22 20:25:32 UTC
Unfortunately, I no longer have the problematic hardware to test with.
Comment 20 Len Brown 2006-09-11 14:48:54 UTC
please re-open if this issue is reproducible. 

Note You need to log in before you can comment on or make changes to this bug.