Bug 10245 - lm_sensors causes ACPI errors and critical thermal shutdown
Summary: lm_sensors causes ACPI errors and critical thermal shutdown
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: I2C (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Jean Delvare
URL:
Keywords:
Depends on:
Blocks: 56331
  Show dependency tree
 
Reported: 2008-03-14 15:13 UTC by Chuck Ebbert
Modified: 2013-04-09 06:23 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.25-rc5
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
acpidump.txt (151.25 KB, text/plain)
2008-03-18 03:15 UTC, Mihai Harpau
Details
output of "cat /proc/acpi/thermal/*/*" (912 bytes, text/plain)
2008-03-18 03:16 UTC, Mihai Harpau
Details
customized DSDT (280.20 KB, application/octet-stream)
2008-03-19 01:12 UTC, Zhang Rui
Details
dmesg with error for kernel from comment #12 (52.76 KB, text/plain)
2008-03-19 09:06 UTC, Mihai Harpau
Details
kernel config used in these tests (83.13 KB, text/plain)
2008-03-20 05:46 UTC, Mihai Harpau
Details
dmesg with error for kernel from comment #12 with acpi_debug_level parameter (114.26 KB, application/x-gzip)
2008-03-21 00:29 UTC, Mihai Harpau
Details
[PATCH] PCI: Revert SMBus unhide on HP Compaq nx6110 (2.23 KB, patch)
2008-03-21 02:34 UTC, Jean Delvare
Details | Diff
new lspci -nnv (10.54 KB, text/plain)
2008-03-21 16:49 UTC, Mihai Harpau
Details

Description Chuck Ebbert 2008-03-14 15:13:20 UTC
Latest working kernel version: unknown
Earliest failing kernel version: 2.6.23.15
Also failing: 2.6.24.3, 2.6.25-rc5
Distribution: Fedora 8

https://bugzilla.redhat.com/show_bug.cgi?id=437466

Mar 14 13:22:20 sysop kernel: ACPI Exception (exoparg2-0442):
AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
Mar 14 13:22:20 sysop kernel: ACPI Error (psparse-0537): Method parse/execution
failed [\_TZ_.C247] (Node f781c2d0), AE_AML_PACKAGE_LIMIT
Mar 14 13:22:20 sysop kernel: ACPI Error (psparse-0537): Method parse/execution
failed [\_TZ_.C246] (Node f781c2b8), AE_AML_PACKAGE_LIMIT
Mar 14 13:22:20 sysop kernel: ACPI Error (psparse-0537): Method parse/execution
failed [\_TZ_.TZ2_._TMP] (Node f781c858), AE_AML_PACKAGE_LIMIT
Mar 14 13:22:20 sysop kernel: ACPI Exception (exoparg2-0442):
AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
Mar 14 13:22:20 sysop kernel: ACPI Error (psparse-0537): Method parse/execution
failed [\_TZ_.C247] (Node f781c2d0), AE_AML_PACKAGE_LIMIT
Mar 14 13:22:20 sysop kernel: ACPI Error (psparse-0537): Method parse/execution
failed [\_TZ_.C246] (Node f781c2b8), AE_AML_PACKAGE_LIMIT
Mar 14 13:22:20 sysop kernel: ACPI Error (psparse-0537): Method parse/execution
failed [\_TZ_.TZ2_._TMP] (Node f781c858), AE_AML_PACKAGE_LIMIT
Mar 14 13:22:20 sysop kernel: ACPI: Critical trip point
Mar 14 13:22:20 sysop kernel: Critical temperature reached (1024 C), shutting down.
Mar 14 13:22:20 sysop kernel: ACPI: Critical trip point
Mar 14 13:22:20 sysop kernel: Critical temperature reached (1024 C), shutting down.
Mar 14 13:22:20 sysop kernel: ACPI: Critical trip point
Mar 14 13:22:20 sysop kernel: Critical temperature reached (1024 C), shutting down.
Comment 1 ykzhao 2008-03-16 07:05:12 UTC
Will you please attach the output of acpidump?
Comment 2 Zhang Rui 2008-03-17 00:23:54 UTC
Please try the patch:
http://bugzilla.kernel.org/show_bug.cgi?id=9558#c33
and see if it helps. :)
Please attach the result of "cat /proc/acpi/thermal/*/*" as well.
Comment 3 Mihai Harpau 2008-03-18 03:15:12 UTC
Created attachment 15324 [details]
acpidump.txt
Comment 4 Mihai Harpau 2008-03-18 03:16:28 UTC
Created attachment 15325 [details]
output of "cat /proc/acpi/thermal/*/*"
Comment 5 Mihai Harpau 2008-03-18 03:18:46 UTC
sorry in c4 I mean "cat /proc/acpi/thermal_zone/*/*" because there is no /proc/acpi/thermal directory
Comment 6 Mihai Harpau 2008-03-18 03:25:56 UTC
I'll report back the result of testing with the patch from comment #2
Comment 7 Mihai Harpau 2008-03-18 09:47:22 UTC
with kernel 2.6.25-rc6-git-current + patch from comment #2 I have the same thing:

Mar 18 14:57:55 sysop kernel: ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
Mar 18 14:57:55 sysop kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.C247] (Node f7840c00), AE_AML_PACKAGE_LIMIT
Mar 18 14:57:55 sysop kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.C246] (Node f7840bc0), AE_AML_PACKAGE_LIMIT
Mar 18 14:57:55 sysop kernel: ACPI Error (psparse-0537): Method parse/execution failed [\_TZ_.TZ2_._TMP] (Node f7842ac0), AE_AML_PACKAGE_LIMIT
Mar 18 14:57:55 sysop kernel: ACPI: Critical trip point
Mar 18 14:57:55 sysop kernel: Critical temperature reached (1024 C), shutting down.
Mar 18 14:57:55 sysop kernel: ACPI: Critical trip point
Mar 18 14:57:55 sysop kernel: Critical temperature reached (1024 C), shutting down.
Mar 18 14:57:55 sysop kernel: Critical temperature reached (57 C), shutting down.
Mar 18 14:57:55 sysop shutdown[17493]: shutting down for system halt
Comment 8 Mihai Harpau 2008-03-18 10:15:02 UTC
with kernel 2.6.24.3 + patch from comment #2 I don't have problem until now (run for 1:35 hours)
with kernel from comment #7 critical shutdown had come in about 54 min
Comment 9 Zhang Rui 2008-03-19 01:12:08 UTC
Created attachment 15340 [details]
customized DSDT

Please try the customized dsdt and attach the full dmesg output after a critical shutdown.
Comment 10 ykzhao 2008-03-19 06:10:17 UTC
How to use custom dsdt can be found in the 
http://www.lesswatts.org/projects/acpi/faq.php

Thanks.
Comment 11 ykzhao 2008-03-19 06:12:14 UTC
Hi, Mihai
   Will you please confirm whether the system can work well if the lm_sensors is not started?
   Thanks.
Comment 12 Mihai Harpau 2008-03-19 06:25:45 UTC
I used the information from here: http://www.lesswatts.org/projects/acpi/overridingDSDT.php (appear to be the same).
I run now kernel 2.6.25-rc6-git-current + patch from comment #2 + custom DSDT build in this kernel and wait the critical shutdown.
In response for comment #11 I confirm that if lm_sensors is not started any kernel work without problems.
Comment 13 Mihai Harpau 2008-03-19 09:06:43 UTC
Created attachment 15348 [details]
dmesg with error for kernel from comment #12
Comment 14 Zhang Rui 2008-03-20 01:36:54 UTC
oops,
Mihai, will you please re-do the test again with kernel parameter acpi.debug_level=0x0f please?
Comment 15 Mihai Harpau 2008-03-20 05:40:07 UTC
I see "Unknown boot option `acpi.debug_level=0x0f': ignoring" when trying that kernel parameter. maybe have to recompile kernel with CONFIG_ACPI_DEBUG=y ?
Comment 16 Mihai Harpau 2008-03-20 05:46:55 UTC
Created attachment 15354 [details]
kernel config used in these tests
Comment 17 ykzhao 2008-03-20 22:03:13 UTC
Hi, Mihai
   It seems that the bug is caused by the conflict between the lm_senors and AML code.
   In the AML code the COA1 method will be called to get the temperature of the sensor. And the following address will be accessed.
   > OperationRegion (C09B, SystemIO, 0x1200, 0x06)
    >            Field (C09B, ByteAcc, NoLock, Preserve)              {
>                    C09C,   8,
                            Offset (0x02),
 >                   C09D,   8, 
  >                  C09E,   8,
                    C09F,   8,
                    C0A0,   8         }
   In fact the above is the I/O access address of the I2C/SMBUS host controller. AML code uses I/O access mode to get the temperature of the sensor.
   
   After the i2c-801 and lm90 drivers are loaed, the /usr/bin/lmsensor will use the sys I/F to access the state of the sensor. And lm90 driver will also access the SMBUS host controller. But unfortunately there is no synchronization between AML code and i2c-801 driver, which will cause that _TZ2._TMP method returns the incorrect temperature or the error message is reported.
   
   
   
    
Comment 18 Zhang Rui 2008-03-21 00:21:43 UTC
This seems to be a conflict between lmsensor and ACPI thermal zone.
lmsensor loaded causes the bogus temperature getton by ACPI thermal zone, thus resule in a critical shutdown
Comment 19 Mihai Harpau 2008-03-21 00:29:26 UTC
Created attachment 15361 [details]
dmesg with error for kernel from comment #12 with acpi_debug_level parameter
Comment 20 Mihai Harpau 2008-03-21 00:31:39 UTC
So I need a new lm90 driver lmsensor, isn't ?
Comment 21 Jean Delvare 2008-03-21 01:17:31 UTC
Please tell us the vendor and model of your system.

Please attach the output of /sbin/lspci -nnv.

Please attach the output of sensors.
Comment 22 Mihai Harpau 2008-03-21 01:43:18 UTC
(In reply to comment #21)
> Please tell us the vendor and model of your system.
>
HP Compaq nc6120
> Please attach the output of /sbin/lspci -nnv.
>
[root@sysop ~]# /sbin/lspci -nnv
00:00.0 Host bridge [0600]: Intel Corporation Mobile 915GM/PM/GMS/910GML Express Processor to DRAM Controller [8086:2590] (rev 03)
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, fast devsel, latency 0
        Capabilities: [e0] Vendor Specific Information <?>
        Kernel driver in use: agpgart-intel

00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2592] (rev 03) (prog-if 00 [VGA controller])
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at d0400000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at 7000 [size=8]
        Memory at c0000000 (32-bit, prefetchable) [size=256M]
        Memory at d0480000 (32-bit, non-prefetchable) [size=256K]
        Capabilities: [d0] Power Management version 2
        Kernel modules: intelfb

00:02.1 Display controller [0380]: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2792] (rev 03)
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, fast devsel, latency 0
        Memory at d0500000 (32-bit, non-prefetchable) [size=512K]
        Capabilities: [d0] Power Management version 2

00:1c.0 PCI bridge [0604]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI Express Port 1 [8086:2660] (rev 03) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=10, subordinate=10, sec-latency=0
        Capabilities: [40] Express Root Port (Slot+), MSI 00
        Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
        Capabilities: [90] Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Capabilities: [a0] Power Management version 2
        Kernel driver in use: pcieport-driver

00:1d.0 USB Controller [0c03]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #1 [8086:2658] (rev 03) (prog-if 00 [UHCI])
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 0, IRQ 20
        I/O ports at 2000 [size=32]
        Kernel driver in use: uhci_hcd
        Kernel modules: uhci-hcd

00:1d.1 USB Controller [0c03]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #2 [8086:2659] (rev 03) (prog-if 00 [UHCI])
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 0, IRQ 21
        I/O ports at 2020 [size=32]
        Kernel driver in use: uhci_hcd
        Kernel modules: uhci-hcd

00:1d.2 USB Controller [0c03]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #3 [8086:265a] (rev 03) (prog-if 00 [UHCI])
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 0, IRQ 17
        I/O ports at 2040 [size=32]
        Kernel driver in use: uhci_hcd
        Kernel modules: uhci-hcd

00:1d.3 USB Controller [0c03]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB UHCI #4 [8086:265b] (rev 03) (prog-if 00 [UHCI])
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 0, IRQ 18
        I/O ports at 2060 [size=32]
        Kernel driver in use: uhci_hcd
        Kernel modules: uhci-hcd

00:1d.7 USB Controller [0c03]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB2 EHCI Controller [8086:265c] (rev 03) (prog-if 20 [EHCI])
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 0, IRQ 20
        Memory at d0580000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Debug port: BAR=1 offset=00a0
        Kernel driver in use: ehci_hcd
        Kernel modules: ehci-hcd

00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev d3) (prog-if 01 [Subtractive decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=02, subordinate=07, sec-latency=32
        Memory behind bridge: d0000000-d03fffff
        Capabilities: [50] Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]

00:1e.2 Multimedia audio controller [0401]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) AC'97 Audio Controller [8086:266e] (rev 03)
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 0, IRQ 23
        I/O ports at 2100 [size=256]
        I/O ports at 2200 [size=64]
        Memory at d0581000 (32-bit, non-prefetchable) [size=512]
        Memory at d0582000 (32-bit, non-prefetchable) [size=256]
        Capabilities: [50] Power Management version 2
        Kernel driver in use: Intel ICH
        Kernel modules: snd-intel8x0

00:1e.3 Modem [0703]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) AC'97 Modem Controller [8086:266d] (rev 03) (prog-if 00 [Generic])
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 0, IRQ 19
        I/O ports at 2400 [size=256]
        I/O ports at 2500 [size=128]
        Capabilities: [50] Power Management version 2
        Kernel driver in use: Intel ICH Modem
        Kernel modules: snd-intel8x0m

00:1f.0 ISA bridge [0601]: Intel Corporation 82801FBM (ICH6M) LPC Interface Bridge [8086:2641] (rev 03)
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 0
        Kernel modules: iTCO_wdt, intel-rng

00:1f.1 IDE interface [0101]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) IDE Controller [8086:266f] (rev 03) (prog-if 8a [Master SecP PriP])
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 0, IRQ 16
        I/O ports at 01f0 [size=8]
        I/O ports at 03f4 [size=1]
        I/O ports at 0170 [size=8]
        I/O ports at 0374 [size=1]
        I/O ports at 2580 [size=16]
        Kernel driver in use: ata_piix
        Kernel modules: ata_piix

00:1f.3 SMBus [0c05]: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus Controller [8086:266a] (rev 03)
        Flags: medium devsel, IRQ 21
        I/O ports at 1200 [size=32]
        Kernel modules: i2c-i801

02:04.0 Network controller [0280]: Intel Corporation PRO/Wireless 2200BG Network Connection [8086:4220] (rev 05)
        Subsystem: Hewlett-Packard Company Compaq nw8240/nx8220 [103c:12f6]
        Flags: bus master, medium devsel, latency 64, IRQ 23
        Memory at d0000000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [dc] Power Management version 2
        Kernel driver in use: ipw2200
        Kernel modules: ipw2200

02:06.0 CardBus bridge [0607]: Texas Instruments PCIxx21/x515 Cardbus Controller [104c:8031]
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 168, IRQ 17
        Memory at d0001000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=02, secondary=03, subordinate=03, sec-latency=176
        Memory window 0: 40000000-43fff000 (prefetchable)
        Memory window 1: 44000000-47fff000
        I/O window 0: 00001400-000014ff
        I/O window 1: 00001800-000018ff
        16-bit legacy interface ports at 0001
        Kernel driver in use: yenta_cardbus

02:06.1 CardBus bridge [0607]: Texas Instruments PCIxx21/x515 Cardbus Controller [104c:8031]
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 168, IRQ 18
        Memory at d0002000 (32-bit, non-prefetchable) [size=4K]
        Bus: primary=02, secondary=04, subordinate=07, sec-latency=176
        Memory window 0: 48000000-4bfff000 (prefetchable)
        Memory window 1: 4c000000-4ffff000
        I/O window 0: 00001c00-00001cff
        I/O window 1: 00002800-000028ff
        16-bit legacy interface ports at 0001
        Kernel driver in use: yenta_cardbus

02:06.2 FireWire (IEEE 1394) [0c00]: Texas Instruments OHCI Compliant IEEE 1394 Host Controller [104c:8032] (prog-if 10 [OHCI])
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 64, IRQ 19
        Memory at d0003000 (32-bit, non-prefetchable) [size=2K]
        Memory at d0004000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [44] Power Management version 2
        Kernel driver in use: firewire_ohci
        Kernel modules: firewire-ohci

02:06.3 Mass storage controller [0180]: Texas Instruments PCIxx21 Integrated FlashMedia Controller [104c:8033]
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 64, IRQ 22
        Memory at d0008000 (32-bit, non-prefetchable) [size=8K]
        Capabilities: [44] Power Management version 2
        Kernel driver in use: tifm_7xx1
        Kernel modules: tifm_7xx1

02:06.4 SD Host controller [0805]: Texas Instruments PCI6411/6421/6611/6621/7411/7421/7611/7621 Secure Digital Controller [104c:8034]
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, medium devsel, latency 64, IRQ 22
        Memory at d000a000 (32-bit, non-prefetchable) [size=256]
        Memory at d000b000 (32-bit, non-prefetchable) [size=256]
        Memory at d000c000 (32-bit, non-prefetchable) [size=256]
        Capabilities: [80] Power Management version 2
        Kernel driver in use: sdhci
        Kernel modules: sdhci

02:06.5 Communication controller [0780]: Texas Instruments PCI6411/6421/6611/6621/7411/7421/7611/7621 Smart Card Controller [104c:8035]
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: medium devsel, IRQ 10
        Memory at d000d000 (32-bit, non-prefetchable) [size=4K]
        Memory at d000e000 (32-bit, non-prefetchable) [size=4K]
        Memory at d000f000 (32-bit, non-prefetchable) [size=4K]
        Memory at d0010000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [44] Power Management version 2

02:0e.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme BCM5705M_2 Gigabit Ethernet [14e4:165e] (rev 03)
        Subsystem: Hewlett-Packard Company NX6110/NC6120 [103c:099c]
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16
        Memory at d0020000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data <?>
        Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable-
        Kernel driver in use: tg3
        Kernel modules: tg3

[root@sysop ~]#  
> Please attach the output of sensors.
> 
[root@sysop sysconfig]# sensors
adm1032-i2c-0-4c
Adapter: SMBus I801 adapter at 1200
M/B Temp:    +48°C  (low  =  -128°C, high =  +105°C)   
CPU Temp:  +49.9°C  (low  = +45.0°C, high = +60.0°C)   
M/B Crit:   +104°C  (hyst =   +94°C)                  
CPU Crit:   +103°C  (hyst =   +93°C)                  
Comment 23 Jean Delvare 2008-03-21 02:16:15 UTC
We have a PCI quirk that unhides the SMBus on the HP Compaq nx6110:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3c0a654e390d00fef9d8faed758f5e1e8078adb5

Your HP Compaq nc6120 has the same PCI ID so the quirk is applied to your system as well. Unfortunately the SMBus is used by ACPI for thermal management on your system, so Linux should not attach a native driver to it. We have to remove the quirk in question. This will make the lm90 driver unusable on your system, but at least you will no longer experience random shutdowns. Patch follows.
Comment 24 Jean Delvare 2008-03-21 02:34:21 UTC
Created attachment 15367 [details]
[PATCH] PCI: Revert SMBus unhide on HP Compaq nx6110

This reverts commit 3c0a654e390d00fef9d8faed758f5e1e8078adb5.

The HP Compaq nc6120 has the same PCI sub-device ID as the nx6110, and the SMBus is used by ACPI for thermal management on the nc6120, so Linux should not attach a native driver to it. This means that this quirk is unsafe and has to be removed.

I also added a comment to help developers realize that adding new IDs to this SMBus unhiding quirk table should be done only with great care, and in particular only after checking that ACPI is not making use of the SMBus.
Comment 25 Jean Delvare 2008-03-21 02:35:50 UTC
Mihai, please try the patch in comment #24. The SMBus should disappear from lspci, and sensors will no longer work, but your shutdown problem should be fixed.
Comment 26 Mihai Harpau 2008-03-21 04:29:25 UTC
ok, I'll try patch and will come back with the results
Comment 27 Mihai Harpau 2008-03-21 16:49:09 UTC
Created attachment 15387 [details]
new lspci -nnv
Comment 28 Mihai Harpau 2008-03-21 16:50:02 UTC
Now I run kernel 2.6.25-rc6 + patch from comment #24. Results are:
- SMBus doesn't appears in output from lspci;
- lm_sensors no longer work;
- critical shutdown doesn't come until now (run for 2:15 h).
So it could be said that the bug is fixed (but lm_sensors become useless in this case). If it happens anything bad I'll come back again.
Comment 29 Jean Delvare 2008-03-22 01:28:33 UTC
The lm90 driver wasn't telling you much more than the ACPI thermal zone anyway. You can get the same information with "acpi -t" or by reading /proc/acpi/thermal_zone/*/temperature. The only drawback is that libsensors is usually very well integrated into applications, while ACPI thermal zones not quite so. This will be addressed in kernel 2.6.26, where ACPI thermal zones will be exported to libsensors.

I will send the patch from comment #24 to the PCI subsystem maintainer (Greg KH) now.
Comment 30 Mihai Harpau 2008-03-22 12:11:38 UTC
Even after 21:38 hours critical shutdown doesn't appear so the bug is definitely fixed. Thanks for the patch and explications.
Comment 31 Zhang Rui 2008-03-23 23:13:51 UTC
Mihai,
Please re-assign the bug to Jean or me so that we can update the bug status properly. :)
Comment 32 Mihai Harpau 2008-03-23 23:34:29 UTC
In response to comment #29:
I don't have the acpi program (I use Fedora 8); instead I could use acpitool. They have the same functionality ?
Comment 33 Jean Delvare 2008-03-24 02:10:33 UTC
(In reply to comment #32)
Yes, "acpitool -t" should work as well.
Comment 34 Adrian Bunk 2008-03-28 15:25:19 UTC
fixed by commit a99acc832de1104afaba02d7c2576fd9b9fd6422
Comment 35 Andrew Kirilenko 2008-08-16 08:40:25 UTC
Hello!

And what about us, poor nx6110 users. I'd like to use smbus, but it looks impossible on latest kernel. Any suggestions?
Comment 36 Jean Delvare 2008-08-16 08:51:12 UTC
Well, complain to the laptop manufacturer. They designed it (and its BIOS) in such a way that it is unsafe for any OS to make use of the SMBus. There's nothing we (Linux developers) can do about that.

But anyway, I'm not sure why you insist on using the SMBus on this laptop. ACPI is supposed to handle the thermal management automatically so you shouldn't have to care. What problem are you trying to solve?
Comment 37 Andrew Kirilenko 2008-08-16 09:36:32 UTC
If I'm enabling thermal zones in kernel - it shutdowns in 5-6 minutes with `ACPI: Critical Trip Section` in logs (and the laptop itself is _cold_).

Also, all livecds, I have tried, w/ latest kernel can only boot on it w/ acpi=off (the same - temperature is 180C, shutting down).

So thermal zone doesn't work for me. Can you please provide me w/ patch for 2.6.26 which will enable smbus?
Comment 38 Jean Delvare 2008-08-16 09:48:18 UTC
Report the problem to the ACPI folks and have them fix it. That's the only sane thing to do. Either that, or complain to the laptop vendor so that they fix their BIOS, if this is a BIOS issue. Have you tried upgrading your BIOS first?

Enabling the SMBus on that laptop will cause more ACPI problems, not less (that's what this bug is all about.) We could enable the SMBus conditionally when acpi=off, I guess, but it would take additional code, and I doubt this is worth the effort, given that this laptop probably can't run well with ACPI completely disabled anyway. Most post-2002 PCs can't run properly without ACPI support.
Comment 39 Sérgio M Basto 2008-08-25 18:48:03 UTC
seems to me that SMBUS mess up with ACPI , and not ACPI fault ...

On my compaq nx6110 I complained , when SMBUS is unhide , I got one huge kernel warning oops, after hibernate the laptop. And I got worst results on hibernation and suspend the laptop. 

Anyway what lm-sensors could give us new ? someone says in this bug report that don't have anything new than what ACPI does , that information is correct ? 

Note You need to log in before you can comment on or make changes to this bug.