Bug 15106
Summary: | kacpi_notify_wq and kacpi_wq use a lot of cpu - hp-compaq 610 laptop | ||
---|---|---|---|
Product: | ACPI | Reporter: | Ali Gholami Rudi (aliqrudi) |
Component: | EC | Assignee: | acpi_ec |
Status: | REJECTED UNREPRODUCIBLE | ||
Severity: | normal | CC: | astarikovskiy, lenb, rui.zhang |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.32-rc5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | acpidump output |
Description
Ali Gholami Rudi
2010-01-22 19:11:14 UTC
Created attachment 24674 [details]
acpidump output
please add gpe_event_info->gpe_number to printk in gpe functions... please also try to insert an msleep() before call of acpi_os_execute in ev_asynch_execute_gpe_method() The high cpu-usage case didn't happen yet. But, as mentioned, sometimes when I reboot the machine the fan works fullspeed while /sys/devices/virtual/thermal/cooling*/cur_state show that all fans are off (echoing 1 and the 0 to one of them fixes the problem). With the debug messages, when I boot normally nothing is printed and "dmesg | grep" returns nothing. But once when I rebooted I got: === cmd ==== # dmesg | grep gpe [ 1.421575] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 1.432020] acpi_ev_asynch_enable_gpe gpe=2 [ 1.432069] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 1.443020] acpi_ev_asynch_enable_gpe gpe=2 [ 1.514204] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 1.526017] acpi_ev_asynch_enable_gpe gpe=2 [ 1.561500] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 1.573019] acpi_ev_asynch_enable_gpe gpe=2 [ 1.579249] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 1.591020] acpi_ev_asynch_enable_gpe gpe=2 [ 1.614070] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 1.626019] acpi_ev_asynch_enable_gpe gpe=2 [ 1.664005] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 1.676019] acpi_ev_asynch_enable_gpe gpe=2 [ 1.742839] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 1.755020] acpi_ev_asynch_enable_gpe gpe=2 [ 2.213431] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.230083] acpi_ev_asynch_enable_gpe gpe=2 [ 2.233125] sda6acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.247068] acpi_ev_asynch_enable_gpe gpe=2 [ 2.263346] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.274348] acpi_ev_asynch_enable_gpe gpe=2 [ 2.279730] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.294067] acpi_ev_asynch_enable_gpe gpe=2 [ 2.296725] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.310094] acpi_ev_asynch_enable_gpe gpe=2 [ 2.323665] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.337208] acpi_ev_asynch_enable_gpe gpe=2 [ 2.339898] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.353080] acpi_ev_asynch_enable_gpe gpe=2 [ 2.355600] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.369208] acpi_ev_asynch_enable_gpe gpe=2 [ 2.371756] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.385080] acpi_ev_asynch_enable_gpe gpe=2 [ 2.387613] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.401102] acpi_ev_asynch_enable_gpe gpe=2 [ 2.403596] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.417079] acpi_ev_asynch_enable_gpe gpe=2 [ 2.419548] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.433101] acpi_ev_asynch_enable_gpe gpe=2 [ 2.435317] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.448226] acpi_ev_asynch_enable_gpe gpe=2 [ 2.450326] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.463095] acpi_ev_asynch_enable_gpe gpe=2 [ 2.465132] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.476213] acpi_ev_asynch_enable_gpe gpe=2 [ 2.478265] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.495480] acpi_ev_asynch_enable_gpe gpe=2 [ 2.497799] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.512067] acpi_ev_asynch_enable_gpe gpe=2 [ 2.514293] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.527197] acpi_ev_asynch_enable_gpe gpe=2 [ 2.529309] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.542080] acpi_ev_asynch_enable_gpe gpe=2 [ 2.549565] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 2.563095] acpi_ev_asynch_enable_gpe gpe=2 === cmd ==== But the fan problem didn't happen. It might mean the usleep() call fixed that problem or it was unrelated to that bug. I'll wait till that high-cpu usage case reappears (I'm checking dmesg instead of the cpu-usage now, because the usleep() call might prevent the high cpu usage). Thanks. I'm almost sure the usleep() fixed the fan problem when rebooting. But today I noticed the cpu temperature is sometimes wrong (a bios bug?) /sys/virtual/thermal/thermal_zone3/temperature was 28 degrees in my four hours of uptime; but the wind coming out was warm (I set the "turn on the fan when on AC power" option in the bios). Then I rebooted it, and it showed 42 degrees. And now the fan is constantly on at one of the lower speeds (a bit higher than the default on AC power level) and the temperature fixed at 42 degrees. Also sometime ago I experienced some delay in updating the temperature when I used "therma.act=45"; the temperature suddenly jumped from 45 to 53. Then I tried using thermal.tzp, but AFAIR it didn't help much but I'm not sure about it. Is there something try to make the temperature more accurate? Just to summarize, there are three problems: * the high cpu usage of kacpi_wq (if it happens I'll submit the gpe values in dmesg trace) * the fan turns on at full speed after rebooting while /sys/../cooling_device?/cur_state show fans are all off. Seems to be fixed after inserting usleep() as you suggested, but not sure yet. * the wrong temperature Thanks. Hi, Now my cpu temperature is not updated after boot; that's strange. I remember the temperature used to get updated, though there were noticeable jumps. I'm not sure if anything in the mainline caused this behavior (I'll try to find the time to try an older kernel). Since I can't rely on the temperature, when I feel the cpu is getting warm, I turn on the fan manually by echoing in cooling_deviceX/cur_state in sysfs. I hope this gets fixed :-) Anyway, none of the gpe_methods were called in the last week. It might be related to the above problem. Thanks. Hi Alexey, Now I see kacpi_notify high cpu usage again. I think the reason is now I use the machine in a warmer environment than the last few weeks. The cpu temperature update delay still exists: the reported temperature remains constant after boot until it reaches acpi active cooling thermal zone. As you suggested, I've added gpe_number to printk messages and added a msleep(10) call in acpi_ev_asynch_execute_gpe_method(). I see a lot of included messages. kacpi_notify cpu usage has decreased to 10-20% (I think due to the msleep() call). Thanks, Ali === gpe_number debug messages ==== [ 6529.162788] acpi_ev_notify_dispatch [ 6529.168096] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.168177] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.168547] acpi_ev_notify_dispatch [ 6529.170921] acpi_ev_notify_dispatch [ 6529.179067] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.187114] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.187479] acpi_ev_notify_dispatch [ 6529.192863] acpi_ev_notify_dispatch [ 6529.198095] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.198175] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.198547] acpi_ev_notify_dispatch [ 6529.209081] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.209123] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.209484] acpi_ev_notify_dispatch [ 6529.214917] acpi_ev_notify_dispatch [ 6529.220082] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.247565] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.247932] acpi_ev_notify_dispatch [ 6529.254050] acpi_ev_notify_dispatch [ 6529.259041] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.259087] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.259447] acpi_ev_notify_dispatch [ 6529.261819] acpi_ev_notify_dispatch [ 6529.270082] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.308007] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.308383] acpi_ev_notify_dispatch [ 6529.314198] acpi_ev_notify_dispatch [ 6529.319189] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.319236] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.319598] acpi_ev_notify_dispatch [ 6529.321966] acpi_ev_notify_dispatch [ 6529.330082] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.619435] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.619803] acpi_ev_notify_dispatch [ 6529.625930] acpi_ev_notify_dispatch [ 6529.630965] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.631054] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.631418] acpi_ev_notify_dispatch [ 6529.633790] acpi_ev_notify_dispatch [ 6529.642083] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.661485] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.661850] acpi_ev_notify_dispatch [ 6529.667065] acpi_ev_notify_dispatch [ 6529.672066] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.672112] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.672473] acpi_ev_notify_dispatch [ 6529.674840] acpi_ev_notify_dispatch [ 6529.683067] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.694340] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.694704] acpi_ev_notify_dispatch [ 6529.700126] acpi_ev_notify_dispatch [ 6529.705170] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.705214] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.705575] acpi_ev_notify_dispatch [ 6529.707946] acpi_ev_notify_dispatch [ 6529.716081] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.767924] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.768311] acpi_ev_notify_dispatch [ 6529.774284] acpi_ev_notify_dispatch [ 6529.779274] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.779330] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.779691] acpi_ev_notify_dispatch [ 6529.782081] acpi_ev_notify_dispatch [ 6529.790081] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.790126] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.790489] acpi_ev_notify_dispatch [ 6529.801066] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.838882] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.839272] acpi_ev_notify_dispatch [ 6529.844494] acpi_ev_notify_dispatch [ 6529.850057] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.850135] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.850509] acpi_ev_notify_dispatch [ 6529.852883] acpi_ev_notify_dispatch [ 6529.861066] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.867786] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.868167] acpi_ev_notify_dispatch [ 6529.873553] acpi_ev_notify_dispatch [ 6529.879095] acpi_ev_asynch_enable_gpe gpe=2 [ 6529.879174] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6529.879545] acpi_ev_notify_dispatch [ 6529.881915] acpi_ev_notify_dispatch [ 6529.890066] acpi_ev_asynch_enable_gpe gpe=2 [ 6530.079347] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6530.079718] acpi_ev_notify_dispatch [ 6530.085094] acpi_ev_notify_dispatch [ 6530.090116] acpi_ev_asynch_enable_gpe gpe=2 [ 6530.090159] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6530.090519] acpi_ev_notify_dispatch [ 6530.092886] acpi_ev_notify_dispatch [ 6530.101082] acpi_ev_asynch_enable_gpe gpe=2 [ 6530.108256] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6530.108622] acpi_ev_notify_dispatch [ 6530.114163] acpi_ev_notify_dispatch [ 6530.119165] acpi_ev_asynch_enable_gpe gpe=2 [ 6530.119212] acpi_ev_asynch_execute_gpe_method: gpe=2 [ 6530.119572] acpi_ev_notify_dispatch [ 6530.121943] acpi_ev_notify_dispatch [ 6530.130066] acpi_ev_asynch_enable_gpe gpe=2 What kernel do you use now? Could you check if latest git version works any better? I'm using the latest git version (b0483e78). After observing it for some time, I think I know how it happens (I don't know much about acpi, so sorry for wrong conclusions): The high cpu usage happens on 43^C. That is the first active cooling temperature. It seems on this temperature the bios generates lots of acpi events and at each event the acpi driver does the same actions again. Below or above this temperature I can control the fan through /sys but at 43^C, it is reset very fast (most of the time so fast that I cannot notice a change in fan speed). About the reported temperature change delay, I think the bios reports the new temperature only when it thinks it is useful to the driver, hence the temperature is not updated till it reaches 43. When it becomes cooler it gets updated to 42 (it won't report a smaller number since it thinks the driver should not care if the temperature is 42^c or say 20^c, since both are lower than the first trip point). The same holds for higher temperatures; it reports 43 till it reaches 53^c. So I think the main bug is in the bios which generates many acpi events when it is close to a trip point temperature. If so, how can it be fixed? Is it possible to ignore acpi events when they are very close to the last one? Or are there better solutions? Thanks, Ali hm, did you install latest BIOS too? My bios version is F.05 (07/28/2009). Had a look at 610 updates page: http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDownloadEventHandler.jsp?redirectReason=SWD_FTP_Request&swItem=ob-71707-1&prodSeriesId=3958407&prodLine=KV&targetPage=ftp%3A%2F%2Fftp.hp.com%2Fpub%2Fsoftpaq%2Fsp43501-44000/sp43692.exe&filesize=2227072 There are bios updates but this page doesn't mention such a fix: http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=321957&prodSeriesId=3958407&swItem=ob-78864-1&prodNameId=3958408&swEnvOID=1093&swLang=8&taskId=135&mode=5 Updating the bios might help, but AFAICT bios upgrade requires windows which is very sad since I don't have it. Thanks, Ali does the problem still exist in the latest upstream kernel, say 2.6.35 or 2.6.36-rc? Not having windows to update the bios, I decided to pass thermal.off=1 to the kernel and manage fan speed manually. So I don't know if it is fixed or not. I'll retry using ACPI in a few days. By the way, switching to x86 (I used to use a 64-bit kernel) did fix one of my problems (I'm afraid I don't remember what it was exactly, maybe something related to the screen backlight), so it may be the bios misbahaving in a 64-bit kernel. But we'll see. I don't know if it's due to switching to a x86 kernel or a change in acpi code, now the high cpu usage doesn't happen after a day of testing. The lazy temperature updating still exists but I guess that's a bios issue (it reports 42^C first and then stays at 43^C and the fan speed changes to one). please feel free to re-open the bug if the problem can be reproduced. |