Bug 10245
Summary: | lm_sensors causes ACPI errors and critical thermal shutdown | ||
---|---|---|---|
Product: | Drivers | Reporter: | Chuck Ebbert (cebbert) |
Component: | I2C | Assignee: | Jean Delvare (jdelvare) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | acpi-bugzilla, bunk, greg, jdelvare, mishu, sergio |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25-rc5 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 56331 | ||
Attachments: |
acpidump.txt
output of "cat /proc/acpi/thermal/*/*" customized DSDT dmesg with error for kernel from comment #12 kernel config used in these tests dmesg with error for kernel from comment #12 with acpi_debug_level parameter [PATCH] PCI: Revert SMBus unhide on HP Compaq nx6110 new lspci -nnv |
Description
Chuck Ebbert
2008-03-14 15:13:20 UTC
Will you please attach the output of acpidump? Please try the patch: http://bugzilla.kernel.org/show_bug.cgi?id=9558#c33 and see if it helps. :) Please attach the result of "cat /proc/acpi/thermal/*/*" as well. Created attachment 15324 [details]
acpidump.txt
Created attachment 15325 [details]
output of "cat /proc/acpi/thermal/*/*"
sorry in c4 I mean "cat /proc/acpi/thermal_zone/*/*" because there is no /proc/acpi/thermal directory I'll report back the result of testing with the patch from comment #2 with kernel 2.6.25-rc6-git-current + patch from comment #2 I have the same thing: Mar 18 14:57:55 sysop kernel: ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126] Mar 18 14:57:55 sysop kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.C247] (Node f7840c00), AE_AML_PACKAGE_LIMIT Mar 18 14:57:55 sysop kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.C246] (Node f7840bc0), AE_AML_PACKAGE_LIMIT Mar 18 14:57:55 sysop kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.TZ2_._TMP] (Node f7842ac0), AE_AML_PACKAGE_LIMIT Mar 18 14:57:55 sysop kernel: ACPI: Critical trip point Mar 18 14:57:55 sysop kernel: Critical temperature reached (1024 C), shutting down. Mar 18 14:57:55 sysop kernel: ACPI: Critical trip point Mar 18 14:57:55 sysop kernel: Critical temperature reached (1024 C), shutting down. Mar 18 14:57:55 sysop kernel: Critical temperature reached (57 C), shutting down. Mar 18 14:57:55 sysop shutdown[17493]: shutting down for system halt with kernel 2.6.24.3 + patch from comment #2 I don't have problem until now (run for 1:35 hours) with kernel from comment #7 critical shutdown had come in about 54 min Created attachment 15340 [details]
customized DSDT
Please try the customized dsdt and attach the full dmesg output after a critical shutdown.
How to use custom dsdt can be found in the http://www.lesswatts.org/projects/acpi/faq.php Thanks. Hi, Mihai Will you please confirm whether the system can work well if the lm_sensors is not started? Thanks. I used the information from here: http://www.lesswatts.org/projects/acpi/overridingDSDT.php (appear to be the same). I run now kernel 2.6.25-rc6-git-current + patch from comment #2 + custom DSDT build in this kernel and wait the critical shutdown. In response for comment #11 I confirm that if lm_sensors is not started any kernel work without problems. Created attachment 15348 [details] dmesg with error for kernel from comment #12 oops, Mihai, will you please re-do the test again with kernel parameter acpi.debug_level=0x0f please? I see "Unknown boot option `acpi.debug_level=0x0f': ignoring" when trying that kernel parameter. maybe have to recompile kernel with CONFIG_ACPI_DEBUG=y ? Created attachment 15354 [details]
kernel config used in these tests
Hi, Mihai
It seems that the bug is caused by the conflict between the lm_senors and AML code.
In the AML code the COA1 method will be called to get the temperature of the sensor. And the following address will be accessed.
> OperationRegion (C09B, SystemIO, 0x1200, 0x06)
> Field (C09B, ByteAcc, NoLock, Preserve) {
> C09C, 8,
Offset (0x02),
> C09D, 8,
> C09E, 8,
C09F, 8,
C0A0, 8 }
In fact the above is the I/O access address of the I2C/SMBUS host controller. AML code uses I/O access mode to get the temperature of the sensor.
After the i2c-801 and lm90 drivers are loaed, the /usr/bin/lmsensor will use the sys I/F to access the state of the sensor. And lm90 driver will also access the SMBUS host controller. But unfortunately there is no synchronization between AML code and i2c-801 driver, which will cause that _TZ2._TMP method returns the incorrect temperature or the error message is reported.
This seems to be a conflict between lmsensor and ACPI thermal zone. lmsensor loaded causes the bogus temperature getton by ACPI thermal zone, thus resule in a critical shutdown Created attachment 15361 [details] dmesg with error for kernel from comment #12 with acpi_debug_level parameter So I need a new lm90 driver lmsensor, isn't ? Please tell us the vendor and model of your system. Please attach the output of /sbin/lspci -nnv. Please attach the output of sensors. (In reply to comment #21) > Please tell us the vendor and model of your system. > HP Compaq nc6120 > Please attach the output of /sbin/lspci -nnv. > [root@sysop ~]# /sbin/lspci -nnv 00:00.0 Host bridge [0600]: Intel Corporation Mobile 915GM/PM/GMS/910GML Express Processor to DRAM Controller [8086:2590] (rev 03) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, fast devsel, latency 0 Capabilities: [e0] Vendor Specific Information <?> Kernel driver in use: agpgart-intel 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2592] (rev 03) (prog-if 00 [VGA controller]) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at d0400000 (32-bit, non-prefetchable) [size=512K] I/O ports at 7000 [size=8] Memory at c0000000 (32-bit, prefetchable) [size=256M] Memory at d0480000 (32-bit, non-prefetchable) [size=256K] Capabilities: [d0] Power Management version 2 Kernel modules: intelfb 00:02.1 Display controller [0380]: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2792] (rev 03) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, fast devsel, latency 0 Memory at d0500000 (32-bit, non-prefetchable) [size=512K] Capabilities: [d0] Power Management version 2 00:1c.0 PCI bridge [0604]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI Express Port 1 [8086:2660] (rev 03) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=10, subordinate=10, sec-latency=0 Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable- Capabilities: [90] Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Capabilities: [a0] Power Management version 2 Kernel driver in use: pcieport-driver 00:1d.0 USB Controller [0c03]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #1 [8086:2658] (rev 03) (prog-if 00 [UHCI]) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 0, IRQ 20 I/O ports at 2000 [size=32] Kernel driver in use: uhci_hcd Kernel modules: uhci-hcd 00:1d.1 USB Controller [0c03]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #2 [8086:2659] (rev 03) (prog-if 00 [UHCI]) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 0, IRQ 21 I/O ports at 2020 [size=32] Kernel driver in use: uhci_hcd Kernel modules: uhci-hcd 00:1d.2 USB Controller [0c03]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #3 [8086:265a] (rev 03) (prog-if 00 [UHCI]) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 0, IRQ 17 I/O ports at 2040 [size=32] Kernel driver in use: uhci_hcd Kernel modules: uhci-hcd 00:1d.3 USB Controller [0c03]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #4 [8086:265b] (rev 03) (prog-if 00 [UHCI]) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 0, IRQ 18 I/O ports at 2060 [size=32] Kernel driver in use: uhci_hcd Kernel modules: uhci-hcd 00:1d.7 USB Controller [0c03]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB2 EHCI Controller [8086:265c] (rev 03) (prog-if 20 [EHCI]) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 0, IRQ 20 Memory at d0580000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Kernel driver in use: ehci_hcd Kernel modules: ehci-hcd 00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev d3) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=07, sec-latency=32 Memory behind bridge: d0000000-d03fffff Capabilities: [50] Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] 00:1e.2 Multimedia audio controller [0401]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) AC'97 Audio Controller [8086:266e] (rev 03) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 0, IRQ 23 I/O ports at 2100 [size=256] I/O ports at 2200 [size=64] Memory at d0581000 (32-bit, non-prefetchable) [size=512] Memory at d0582000 (32-bit, non-prefetchable) [size=256] Capabilities: [50] Power Management version 2 Kernel driver in use: Intel ICH Kernel modules: snd-intel8x0 00:1e.3 Modem [0703]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) AC'97 Modem Controller [8086:266d] (rev 03) (prog-if 00 [Generic]) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 0, IRQ 19 I/O ports at 2400 [size=256] I/O ports at 2500 [size=128] Capabilities: [50] Power Management version 2 Kernel driver in use: Intel ICH Modem Kernel modules: snd-intel8x0m 00:1f.0 ISA bridge [0601]: Intel Corporation 82801FBM (ICH6M) LPC Interface Bridge [8086:2641] (rev 03) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 0 Kernel modules: iTCO_wdt, intel-rng 00:1f.1 IDE interface [0101]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) IDE Controller [8086:266f] (rev 03) (prog-if 8a [Master SecP PriP]) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 0, IRQ 16 I/O ports at 01f0 [size=8] I/O ports at 03f4 [size=1] I/O ports at 0170 [size=8] I/O ports at 0374 [size=1] I/O ports at 2580 [size=16] Kernel driver in use: ata_piix Kernel modules: ata_piix 00:1f.3 SMBus [0c05]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus Controller [8086:266a] (rev 03) Flags: medium devsel, IRQ 21 I/O ports at 1200 [size=32] Kernel modules: i2c-i801 02:04.0 Network controller [0280]: Intel Corporation PRO/Wireless 2200BG Network Connection [8086:4220] (rev 05) Subsystem: Hewlett-Packard Company Compaq nw8240/nx8220 [103c:12f6] Flags: bus master, medium devsel, latency 64, IRQ 23 Memory at d0000000 (32-bit, non-prefetchable) [size=4K] Capabilities: [dc] Power Management version 2 Kernel driver in use: ipw2200 Kernel modules: ipw2200 02:06.0 CardBus bridge [0607]: Texas Instruments PCIxx21/x515 Cardbus Controller [104c:8031] Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 168, IRQ 17 Memory at d0001000 (32-bit, non-prefetchable) [size=4K] Bus: primary=02, secondary=03, subordinate=03, sec-latency=176 Memory window 0: 40000000-43fff000 (prefetchable) Memory window 1: 44000000-47fff000 I/O window 0: 00001400-000014ff I/O window 1: 00001800-000018ff 16-bit legacy interface ports at 0001 Kernel driver in use: yenta_cardbus 02:06.1 CardBus bridge [0607]: Texas Instruments PCIxx21/x515 Cardbus Controller [104c:8031] Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 168, IRQ 18 Memory at d0002000 (32-bit, non-prefetchable) [size=4K] Bus: primary=02, secondary=04, subordinate=07, sec-latency=176 Memory window 0: 48000000-4bfff000 (prefetchable) Memory window 1: 4c000000-4ffff000 I/O window 0: 00001c00-00001cff I/O window 1: 00002800-000028ff 16-bit legacy interface ports at 0001 Kernel driver in use: yenta_cardbus 02:06.2 FireWire (IEEE 1394) [0c00]: Texas Instruments OHCI Compliant IEEE 1394 Host Controller [104c:8032] (prog-if 10 [OHCI]) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 64, IRQ 19 Memory at d0003000 (32-bit, non-prefetchable) [size=2K] Memory at d0004000 (32-bit, non-prefetchable) [size=16K] Capabilities: [44] Power Management version 2 Kernel driver in use: firewire_ohci Kernel modules: firewire-ohci 02:06.3 Mass storage controller [0180]: Texas Instruments PCIxx21 Integrated FlashMedia Controller [104c:8033] Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 64, IRQ 22 Memory at d0008000 (32-bit, non-prefetchable) [size=8K] Capabilities: [44] Power Management version 2 Kernel driver in use: tifm_7xx1 Kernel modules: tifm_7xx1 02:06.4 SD Host controller [0805]: Texas Instruments PCI6411/6421/6611/6621/7411/7421/7611/7621 Secure Digital Controller [104c:8034] Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, medium devsel, latency 64, IRQ 22 Memory at d000a000 (32-bit, non-prefetchable) [size=256] Memory at d000b000 (32-bit, non-prefetchable) [size=256] Memory at d000c000 (32-bit, non-prefetchable) [size=256] Capabilities: [80] Power Management version 2 Kernel driver in use: sdhci Kernel modules: sdhci 02:06.5 Communication controller [0780]: Texas Instruments PCI6411/6421/6611/6621/7411/7421/7611/7621 Smart Card Controller [104c:8035] Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: medium devsel, IRQ 10 Memory at d000d000 (32-bit, non-prefetchable) [size=4K] Memory at d000e000 (32-bit, non-prefetchable) [size=4K] Memory at d000f000 (32-bit, non-prefetchable) [size=4K] Memory at d0010000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Power Management version 2 02:0e.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme BCM5705M_2 Gigabit Ethernet [14e4:165e] (rev 03) Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c] Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16 Memory at d0020000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at <ignored> [disabled] Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data <?> Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable- Kernel driver in use: tg3 Kernel modules: tg3 [root@sysop ~]# > Please attach the output of sensors. > [root@sysop sysconfig]# sensors adm1032-i2c-0-4c Adapter: SMBus I801 adapter at 1200 M/B Temp: +48°C (low = -128°C, high = +105°C) CPU Temp: +49.9°C (low = +45.0°C, high = +60.0°C) M/B Crit: +104°C (hyst = +94°C) CPU Crit: +103°C (hyst = +93°C) We have a PCI quirk that unhides the SMBus on the HP Compaq nx6110: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3c0a654e390d00fef9d8faed758f5e1e8078adb5 Your HP Compaq nc6120 has the same PCI ID so the quirk is applied to your system as well. Unfortunately the SMBus is used by ACPI for thermal management on your system, so Linux should not attach a native driver to it. We have to remove the quirk in question. This will make the lm90 driver unusable on your system, but at least you will no longer experience random shutdowns. Patch follows. Created attachment 15367 [details]
[PATCH] PCI: Revert SMBus unhide on HP Compaq nx6110
This reverts commit 3c0a654e390d00fef9d8faed758f5e1e8078adb5.
The HP Compaq nc6120 has the same PCI sub-device ID as the nx6110, and the SMBus is used by ACPI for thermal management on the nc6120, so Linux should not attach a native driver to it. This means that this quirk is unsafe and has to be removed.
I also added a comment to help developers realize that adding new IDs to this SMBus unhiding quirk table should be done only with great care, and in particular only after checking that ACPI is not making use of the SMBus.
Mihai, please try the patch in comment #24. The SMBus should disappear from lspci, and sensors will no longer work, but your shutdown problem should be fixed. ok, I'll try patch and will come back with the results Created attachment 15387 [details]
new lspci -nnv
Now I run kernel 2.6.25-rc6 + patch from comment #24. Results are: - SMBus doesn't appears in output from lspci; - lm_sensors no longer work; - critical shutdown doesn't come until now (run for 2:15 h). So it could be said that the bug is fixed (but lm_sensors become useless in this case). If it happens anything bad I'll come back again. The lm90 driver wasn't telling you much more than the ACPI thermal zone anyway. You can get the same information with "acpi -t" or by reading /proc/acpi/thermal_zone/*/temperature. The only drawback is that libsensors is usually very well integrated into applications, while ACPI thermal zones not quite so. This will be addressed in kernel 2.6.26, where ACPI thermal zones will be exported to libsensors. I will send the patch from comment #24 to the PCI subsystem maintainer (Greg KH) now. Even after 21:38 hours critical shutdown doesn't appear so the bug is definitely fixed. Thanks for the patch and explications. Mihai, Please re-assign the bug to Jean or me so that we can update the bug status properly. :) In response to comment #29: I don't have the acpi program (I use Fedora 8); instead I could use acpitool. They have the same functionality ? (In reply to comment #32) Yes, "acpitool -t" should work as well. fixed by commit a99acc832de1104afaba02d7c2576fd9b9fd6422 Hello! And what about us, poor nx6110 users. I'd like to use smbus, but it looks impossible on latest kernel. Any suggestions? Well, complain to the laptop manufacturer. They designed it (and its BIOS) in such a way that it is unsafe for any OS to make use of the SMBus. There's nothing we (Linux developers) can do about that. But anyway, I'm not sure why you insist on using the SMBus on this laptop. ACPI is supposed to handle the thermal management automatically so you shouldn't have to care. What problem are you trying to solve? If I'm enabling thermal zones in kernel - it shutdowns in 5-6 minutes with `ACPI: Critical Trip Section` in logs (and the laptop itself is _cold_). Also, all livecds, I have tried, w/ latest kernel can only boot on it w/ acpi=off (the same - temperature is 180C, shutting down). So thermal zone doesn't work for me. Can you please provide me w/ patch for 2.6.26 which will enable smbus? Report the problem to the ACPI folks and have them fix it. That's the only sane thing to do. Either that, or complain to the laptop vendor so that they fix their BIOS, if this is a BIOS issue. Have you tried upgrading your BIOS first? Enabling the SMBus on that laptop will cause more ACPI problems, not less (that's what this bug is all about.) We could enable the SMBus conditionally when acpi=off, I guess, but it would take additional code, and I doubt this is worth the effort, given that this laptop probably can't run well with ACPI completely disabled anyway. Most post-2002 PCs can't run properly without ACPI support. seems to me that SMBUS mess up with ACPI , and not ACPI fault ... On my compaq nx6110 I complained , when SMBUS is unhide , I got one huge kernel warning oops, after hibernate the laptop. And I got worst results on hibernation and suspend the laptop. Anyway what lm-sensors could give us new ? someone says in this bug report that don't have anything new than what ACPI does , that information is correct ? |