Bug 5872 - kacpid takes 99% cpu with lm_sensors - asus m6va M6800VA
Summary: kacpid takes 99% cpu with lm_sensors - asus m6va M6800VA
Status: REJECTED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: Hardware Monitoring (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Jean Delvare
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-11 17:17 UTC by Pavel Sanda
Modified: 2007-03-07 02:02 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.15
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
lspci output (1.80 KB, text/plain)
2006-01-11 17:18 UTC, Pavel Sanda
Details
dmesg output (13.78 KB, text/plain)
2006-01-11 17:19 UTC, Pavel Sanda
Details
kernel config (34.89 KB, text/plain)
2006-01-11 17:20 UTC, Pavel Sanda
Details
output of acpidump (227.12 KB, text/plain)
2006-01-23 06:35 UTC, Pavel Sanda
Details
requested files (1.96 KB, application/octet-stream)
2006-01-31 15:58 UTC, Pavel Sanda
Details
requested dump (1.20 KB, text/plain)
2006-02-01 03:19 UTC, Pavel Sanda
Details
hardware info (48.65 KB, text/plain)
2006-02-01 03:48 UTC, Pavel Sanda
Details
cpuinfo (434 bytes, text/plain)
2006-02-01 04:34 UTC, Pavel Sanda
Details

Description Pavel Sanda 2006-01-11 17:17:16 UTC
Most recent kernel where this bug did not occur:
Distribution: gentoo
Hardware Environment: intel pentium M 750, 512 mb ram
Software Environment:
Problem Description: when booting kacpid start consuming all cpu time.
pci=routeirq parameter didnt help (acpi=off of course helps).
exact phase where kacpid goes wrong varies - after rebooting its very soon,
after shutdown its at the end, when (lm_)sensors are initialized. 
if more attachments are needed let me know.

Steps to reproduce:boot with acpi
Comment 1 Pavel Sanda 2006-01-11 17:18:26 UTC
Created attachment 6990 [details]
lspci output
Comment 2 Pavel Sanda 2006-01-11 17:19:28 UTC
Created attachment 6991 [details]
dmesg output
Comment 3 Pavel Sanda 2006-01-11 17:20:29 UTC
Created attachment 6992 [details]
kernel config
Comment 4 Pavel Sanda 2006-01-23 06:30:55 UTC
its the same in linux-2.6.16-rc1 kernel.
i have found, that kacpid starts to behave this way after initializing lm_sensors
via "sensors -s" command in initscripts.
the previously noted variabilty is following:
once kacpid goes to 99%(below i attach acpidump), after restart it behave in the
same way from early boot.
one has to shutdown notebook(asus m6va) completely to get rid of it.


Comment 5 Pavel Sanda 2006-01-23 06:35:39 UTC
Created attachment 7102 [details]
output of acpidump
Comment 6 Jean Delvare 2006-01-29 09:32:35 UTC
Is the problem new in 2.6.15? Was it OK in 2.6.14.6 for example?

Does the problem happen every time you run "sensors -s", or only some times?

Does the problem also sometimes happen when just reading the hardware monitoring
values (e.g. by running "sensors")?

Does the problem happen only when "sensors -s" is run during the boot sequence,
or does it also happen if you manually run "sensors -s" afterwards?

Which motherboard or system is this?
Comment 7 Pavel Sanda 2006-01-29 14:09:02 UTC
1. 2.6.14 had the same problem.
2. yes, it makes problem whenever sensors -s is run.
3. when running sensors without initialization(ie without "-s")
   the problem doesnt arise and values seem to be correct.
4. when i boot without sensors -s and afterwards run it manually
   kacpid also takes all CPU - ie it seems it has nothing to do
   with booting process as such.
5. sensors-detect found i2c-i801 bus and lm85 and eeprom chips.


before few days i've posted it in lm_sensors conference also.
Comment 8 Jean Delvare 2006-01-31 01:06:27 UTC
Please give the exact manufacturer and model of your motherboard or system.

Your report suggests that the problem is triggered by some limit value being 
written to the LM85 (or compatible) chip registers. Please provide:
* The output of "sensors" before you run "sensors -s".
* The output of "sensors" after you run "sensors -s". Please run "sensors" a 
couple times with several seconds in between so that alarms have time to 
trigger and wear-off as needed.
* The relevant section of your /etc/sensors.conf file.

Running "sensors -s" scans your /etc/sensors.conf file for "set" statements for 
your chip, and writes values to the chip registers accordingly. We need to find 
out which statement triggers the kacpid problem.

Also, do you have the fan and/or thermal acpi modules loaded? If so, please 
provide all the information available under /proc/acpi/fan 
and /proc/acpi/thermal_zone, before and after running "sensors -s".
Comment 9 Pavel Sanda 2006-01-31 15:58:34 UTC
Created attachment 7191 [details]
requested files

note, that output of sensors after "sensosrs -s" changes
and the notions FAULT and ALARM dynamically changes its positions.
Comment 10 Pavel Sanda 2006-01-31 16:06:21 UTC
i have currently no paper documentation to this notebook - its asus m6va (M6800VA).
i have no idea how to get such info via software.
Comment 11 Pavel Sanda 2006-01-31 16:34:34 UTC
after some fight i found that commenting the following line in sensors.conf helps:

   set in1_min  vid * 0.95
Comment 12 Jean Delvare 2006-02-01 01:43:46 UTC
No binary attachemenr next time please. If there are too many files for 
individual attachements, just concatenate them into a single, larger text file.

What strikes me now is that none of the sensors values seem to make any sense, 
except maybe for the CPU temperature. Your ADT7463 chip seems not to be 
properly wired for complete hardware monitoring.

The fact that "sensors -s" confuses the ACPI thermal and/or fan drivers makes 
it clear to me that the ADT7463 chip is used by ACPI. Maybe Asus wired the 
minimum required for the ACPI functions (maybe just one temperature), although 
it would be a weird choice to choose a full-featured chip and then leave it 
mostly unused. At this point, the most simple solution is that you simply do 
not load the lm85 driver at all - it doesn't provide any valuable information 
over what ACPI already provides as far as I can see.

Could you please provide a dump of your chip?
modprobe i2c-dev
i2cdump 0 0x2d b
I'd like to make sure we are not facing some misdetection case.

It's still not OK that kacpid can eat up all the CPU in such conditions, BTW, 
but I'll leave it to the ACPI people, as I have no idea of what happens here.

Note that you may obtain information about your system (manufacturer, model, 
etc.) by running the dmidecode command line tool. You may also ask Asus for 
information about this ADT7463 chip - how is it wired exactly?
Comment 13 Pavel Sanda 2006-02-01 03:19:28 UTC
Created attachment 7200 [details]
requested dump

unloading lm85 is not needed i think - the only think is not
to initialize or bypass the refered line in config.

it is true, there is only one temperature in THRM, but the chip
provides much information in /sys/bus/i2c/drivers/lm85... and they
seem correct - gkrellm sees them by itself.

at night i played with sensors in M$ windows and they work correctly - 
i can read all the values - they are similar to the output from 
lm_sensors without init. also i was able to change individual fans according
to the temperatures. so it seems, the hardware is prepared for it.

i can post dmidecode output if you wish.
Comment 14 Pavel Sanda 2006-02-01 03:48:16 UTC
Created attachment 7201 [details]
hardware info

attaching what i was able to dig out in M$ Win about hardware.
cpu freq and voltage are below normal as i have it set by default
this way.
Comment 15 Jean Delvare 2006-02-01 04:02:41 UTC
This ain't no misdetection for sure.

Do the values from the lm85 driver (as displayed by "sensors") make sense to 
you? They sure don't make much sense to me (which is why I was suggesting not 
to load the driver at all.)

in0, in2 and in4 are obviously not correct. in3 may make sense if it's +3.3V 
and not +5V, but it would mean the chip wasn't properly wired. in1 may make 
sense if the nominal voltage of your CPU is 1.0V. Is it?

You should comment out all the "set inN_min" and "set inN_min" lines where N is 
0, 2 or 4 in your configuration file. You may also change the in1_min, in1_max 
and in3_min limits as follows:
   set in1_min 0.9
   set in1_max 1.4
   set in3_min 3.3 * 0.9
These can be refined later. The "vid" value doesn't seem to be correct so you 
shouldn't use it when setting limits.

Fans: how many fans are there in this notebook? I guess only one. Does either 
the fan1 or fan2 value change slightly over time? If you have some form of fan 
control enabled (either manual or automatic), does either the fan1 or fan2 
reading reflect the speed changes in a plausible way?

Temperatures: do they fluctuate according to the system load? Do any of them 
tend to match the ACPI reading?

BTW, you should comment out the "set vrm" line in the configuration file, 
detection is supposed to be automatic in recent 2.6 kernels. I have yet to 
check if Pentium M series are covered.
Comment 16 Jean Delvare 2006-02-01 04:17:41 UTC
The VRM table used by this family of CPU is actually not yet supported, I'll 
work on it. It doesn't matter much here anyway as the VID pins of the chip 
appear not to be wired (according to the chip dump you sent).

Can you please provide the contents of /proc/cpuinfo?
Comment 17 Pavel Sanda 2006-02-01 04:27:21 UTC
the values from sensors make sense to me as far as TEMP and FANS
are concerned - and they are what i wanted to know. the meaning of
the voltages  and all the "inXXX" are not completely clear to me.

i know about 3 fans - 1.CPU, 2.Remote, 3.Graphic card.
1 & 2 slightly differ in time. moreover when CPU load
goes to 100% fan1 goes automaticaly also high. sensors
values changing apropriately in the sense, it goes highger/lower,
but im not able to check exact value.

Temperatures depends on system load and ACPI THRM is CPU temp +- 
1 degree.

Comment 18 Pavel Sanda 2006-02-01 04:34:31 UTC
Created attachment 7202 [details]
cpuinfo
Comment 19 Dimitri 2006-04-21 23:56:33 UTC
I have the same problem. But i found that everything's fine when CPU runs at 798
MHz, even sensors -s doesn't make trouble. When CPU switches frequency to 1862
MHz, kacpid takes cpu, and sensors outputs ridiculous data. When CPU switches
back to 798 MHz, everything back to normal.

I'm using asus M9V. intel pentium M 750, 1gb ram
Comment 20 Jean Delvare 2006-04-25 07:08:34 UTC
Dimitri, I guess that the Pentium M 750 has a lower Vcore when running at low
frequency than high frequency. If your hardware monitoring chip has its limit
greater than the low frequency voltage, but below the high frequency voltage,
that would explain what you observe.

Can you please provide the output of "sensors" at both high and low CPU frequency?

Please also provide the output of "/sbin/lspci -x -s 00:00.0".

This problem may be solved by a BIOS update, so please make sure that you are
both using the latest available BIOS from Asus.
Comment 21 Dimitri 2006-04-28 03:04:48 UTC
Thank you so much, Delvare. i updated my BIOS and everything's fine now. In
fact, i had updated my BIOS right after i bought the laptop. I sent it back to
asus for repair after a while and it came back with the problem. i thought they
had updated my BIOS to a newer but faulty version. It never came to me that they
had downdated
it.
Comment 22 Pavel Sanda 2006-04-28 08:13:08 UTC
Dimitri, what is the exact number of your current BIOS version ?
Comment 23 Dimitri 2006-04-28 23:29:09 UTC
205AS, downloaded from the official website.
Comment 24 Jean Delvare 2006-06-04 06:30:45 UTC
Pavel, any news? Did you upgrade your BIOS too? If you did, did it help?
Comment 25 Pavel Sanda 2006-06-04 10:26:40 UTC
hi, i use 206 BIOS version for the whole time (i have M6Va, not M9V). i was able
to trick out most of things i wanted, so i don't play with sensors anymore...
Comment 26 Len Brown 2006-08-10 00:59:20 UTC
This looks like an error that is provoked by the sensors driver.

Probably /proc/acpi will show the acpi line increasing at a high rate
due to an SCI that is provoked by a GPE, and ACPI tries over and
over to turn off the GPE and fails.

If this is the case, I suppose we could add a featre to Linux/ACPI
where it tries to recognize that it is unable to squelch a GPE,
but it isn't clear how to tell the difference between that and
a valid source of a lot of GPEs...

I think in this case, you have to decide to run the sensor driver,
or to run ACPI, but not run both.
Comment 27 Jean Delvare 2007-03-07 02:02:29 UTC
Since lm_sensors version 2.10.1, all set statements are commented out by default
so this kind of problem should no longer happen, at least not until the user
starts tinkering about the configuration file.

Also, one reporter said that the problem was fixed by upgrading the BIOS, which
suggests that part of the problem was due to a broken BIOS.

Note You need to log in before you can comment on or make changes to this bug.