Bug 5177 - intermittent system hang - over temperature? - ASUS P5AD2
Summary: intermittent system hang - over temperature? - ASUS P5AD2
Status: REJECTED UNREPRODUCIBLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Thermal (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-03 03:55 UTC by Alex Unigovsky
Modified: 2007-08-18 22:41 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.13
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Output of lspci -vvx on 2.6.13 with ACPI2.0/APIC turned off (23.15 KB, text/plain)
2005-09-03 03:58 UTC, Alex Unigovsky
Details
Full dmesg of system booting 2.6.13-mm1 (with vestfb-tng and sk98lin patches) and ACPI2.0/APIC turned off in BIOS (23.75 KB, text/plain)
2005-09-05 09:33 UTC, Alex Unigovsky
Details

Description Alex Unigovsky 2005-09-03 03:55:32 UTC
Most recent kernel where this bug did not occur:
2.6.12-rc4 (maybe more recent, I didn't check)
Distribution: Gentoo Linux

Hardware Environment:

ASUS P5AD2 Premium motherboard (925X chipset)
lspci (on 2.6.13 with ACPI 2.0/APIC disabled in BIOS):

0000:00:00.0 Host bridge: Intel Corporation 925X/XE Memory Controller Hub (rev 04)
0000:00:01.0 PCI bridge: Intel Corporation 925X/XE PCI Express Root Port (rev 04)
0000:00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
PCI Express Port 1 (rev 03)
0000:00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
PCI Express Port 2 (rev 03)
0000:00:1c.2 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
PCI Express Port 3 (rev 03)
0000:00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #1 (rev 03)
0000:00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #2 (rev 03)
0000:00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #3 (rev 03)
0000:00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #4 (rev 03)
0000:00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB2 EHCI Controller (rev 03)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3)
0000:00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface
Bridge (rev 03)
0000:00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) IDE Controller (rev 03)
0000:00:1f.2 IDE interface: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW) SATA
Controller (rev 03)
0000:00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus
Controller (rev 03)
0000:01:03.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b Link
Layer Controller (rev 01)
0000:01:04.0 Unknown mass storage controller: Integrated Technology Express,
Inc. IT/ITE8212 Dual channel ATA RAID controller (PCI version seems to be
IT8212, embedded seems (rev 13)
0000:01:05.0 Unknown mass storage controller: Silicon Image, Inc. SiI 3114
[SATALink/SATARaid] Serial ATA Controller (rev 02)
0000:01:09.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04)
0000:01:09.1 Input device controller: Creative Labs SB Audigy MIDI/Game port
(rev 04)
0000:01:09.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04)
0000:02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 Gigabit
Ethernet Controller (rev 15)
0000:03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 Gigabit
Ethernet Controller (rev 15)
0000:05:00.0 VGA compatible controller: nVidia Corporation NV35 [GeForce PCX
5900] (rev a2)

Drives:

2x Western Digital 74GB 10kRPM (sda, sdb) on SiL3114
1x Seagate 200GB (sdc) on SiL3114
1x CD-ROM (hda) on ICH as primary master

See attachment for lspci -vvx output

Tried updating to the most recent BIOS version (1010) but it didn't help either.

Software Environment:

Kernel 2.6.13
Also tried 2.6.13-rc{5,6,7} and 2.6.13-mm1 with no positive result.

Problem Description:

When booting kernel hangs after "applying bridge limits" part in initialization
of SiL3114. If i boot with "noapic", system boots fine, but becomes unstable
quickly (hangs). The only way to boot it reliably is to disable ACPI 2.0 and
APIC in BIOS. But when I do so I lose the second (SMT) CPU. Nevertheless the
system boots fine and performance is good. See /proc/cpuinfo:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      : Intel(R) Pentium(R) 4 CPU 3.40GHz
stepping        : 4
cpu MHz         : 3673.408
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid
xtpr
bogomips        : 7356.59

Steps to reproduce:

1. Compile a 2.6.13 kernel with SCSI generic/disk and SiL3114 driver built in.
2. Boot with a new kernel.
3. Wait forever after the kernel stalls :)

PS. I'm assigning this to ACPI because I guess it's the place to look at, but
I'm not sure.
Comment 1 Alex Unigovsky 2005-09-03 03:58:15 UTC
Created attachment 5875 [details]
Output of lspci -vvx on 2.6.13 with ACPI2.0/APIC turned off

If i need to supply additional info, please let me know. I'll do my best :)
Comment 2 Alex Unigovsky 2005-09-05 09:33:48 UTC
Created attachment 5910 [details]
Full dmesg of system booting 2.6.13-mm1 (with vestfb-tng and sk98lin patches) and ACPI2.0/APIC turned off in BIOS

Maybe this can help... But I see that this bug is just ignored... :(
Comment 3 Len Brown 2005-09-07 18:51:19 UTC
if you have ACPI and the IOAPIC enabled in the bios,
can you boot with "acpi=noirq" or "pci=noacpi" and
get working devices and HT at the same time?
Comment 4 Alex Unigovsky 2005-09-08 04:31:06 UTC
Well, yes, acpi=noirq helped. Now I see 2 CPUs and am able to boot and use SiL
controller. Thanks a lot. Didn't try pci=noacpi, should I?

The only thing that bugs me is this "ACPI Interpreter Disabled" thing in dmesg,
but I think that I'm supposed to see it :)

How else can I help?
Comment 5 Len Brown 2005-09-14 23:56:39 UTC
ACPI and APIC were enabled in BIOS for the "acpi=noirq" boot, yes?
Please attach the complete dmesg and /proc/interrupts from this boot.

Also, please attach the output from acpidump, available in pmmtools here:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils

Are you able to cappture the "debug" console output for the failure case?
If the dmesg above shows no clues we'll need it.
Comment 6 Alex Unigovsky 2005-09-18 18:00:07 UTC
After booting into 2.6.14-rc1-mm1 I had no more error/warning messages when
booting with ACPI and APIC enabled, and I didn't have to issue acpi=noirq.
Hyper-threading worked fine, and all ATA/SATA/whatever controllers were
initialized successfully (the only strange thing I noted is the
longer-than-usual init time for IT8212, but it never gave any error). That is
good news.

The bad news is that the system is very unstable, and freezes just about 30
minutes to several hours after booting. I have no clue how to collect
diagnostics/debugging info, because I cannot predict the freeze - X just stops
responding, keyboard goes down (completely - i.e. numlock doesn't switch).

Can you tell me a way that I can diagnose a problem in this situation? Or maybe
some suggestions...

Kernel 2.6.13-rc1-mm1 compiled with GCC 4.0.1. Dmesg/interrupts/acpidump/etc
will follow shortly. The only other patch applied was sk98lin from
syskonnect.com, because sky2 (even with enabled workarounds) still didn't work
for me.

PS: Sorry if these problems are unrelated, I'm quite new to kernel bugzilla, and
didn't know where to post. And sorry for my bad English :)
Comment 7 Alex Unigovsky 2005-09-24 05:46:57 UTC
On 2.6.14-rc2 (vanilla + reiser4 from -mm) ACPI works fine (no errors,
everything detects/inits fine, HT works). I will post further info after I get a
more powerful CPU cooler, because I suspect in-work freezes are caused by CPU
temp (according to sensors) reaching 70 celsius and staying there). Sorry for
the delay.
Comment 8 Alex Unigovsky 2005-09-24 06:10:34 UTC
Forgot to say, the temperature issue only exists on 2.6.13-14. On 2.6.12, CPU
temp always stays at 54-56 celsius.
Comment 9 Natalie Protasevich 2007-08-06 00:01:18 UTC
It looks like the problem was fixed. Any objections to closing the bug?
Thanks.
Comment 10 Len Brown 2007-08-18 22:41:01 UTC
If you are still seeing a problem, please re-open this bug report, and...
test if it still happens with CONFIG_HWMON=n and ACPI enabled,
or with "acpi=off"

If it fails with ACPI enabled, but works with "acpi=off",
please attach the complete dmesg from the ACPI boot
and the output from acpidump.

Note You need to log in before you can comment on or make changes to this bug.