Distribution: SUSE 9.[23] AMD64 Hardware Environment: Asus L5D Software Environment: SUSE 9.[23] AMD64 Problem Description: ACPI battery monitor is very slow to respond and it does not return any numbers, eg rafael@albercik:/proc/acpi/battery/BAT0> time cat state present:
Created attachment 5092 [details] Patch to revert the changes that introduced the problem This patch reverts the patch that introduced the problem. It is against 2.6.12-rc3-mm2 but applies up to and including 2.6.12-rc5-mm1.
The problem is not present in 2.6.12-rc5-mm2
Hi, Do you use sbs-cm package ? Thanks, Luming
No, I don't. Why? Rafael
The problem is back in 2.6.13-rc1-mm1.
Hi, We need to figure out why burst-mode EC patch doesn't work for you. Please build kernel with ACPI debug option on, and echo -n 0x00100000 > /proc/acpi/debug_layer echo -n 0xffffffff > /proc/acpi/debug_level Then test and send me dmesg. Thanks, Luming
Created attachment 5267 [details] dmesg output from 2.6.13-rc1-mm1 w/ ACPI debug Done. I tested it by executing 'cat /proc/acpi/battery/BAT0/state'. I'm attaching the full output of dmesg (just in case). Greets, Rafael
Thanks for this info. Please run : #cat /proc/acpi/embedded_controller/EC/info if use global lock yes then please try this debug patch. --- ec.0 2005-07-05 21:43:52.000000000 +0800 +++ ec.c 2005-07-05 21:48:06.000000000 +0800 @@ -802,7 +802,8 @@ acpi_driver_data(device) = ec; /* Use the global lock for all EC transactions? */ - acpi_evaluate_integer(ec->handle, "_GLK", NULL, &ec->global_lock); +// acpi_evaluate_integer(ec->handle, "_GLK", NULL, &ec->global_lock); + ec->global_lock = 0; /* If our UID matches the UID for the ECDT-enumerated EC, we now have the *real* EC info, so kill the makeshift one.*/ @@ -1005,7 +1006,7 @@ status = acpi_evaluate_integer(handle, "_GPE", NULL, &ec_ecdt- >gpe_bit); if (ACPI_FAILURE(status)) return status; - ec_ecdt->global_lock = TRUE; + ec_ecdt->global_lock = 0; ec_ecdt->handle = handle; printk(KERN_INFO PREFIX "GPE=0x%02x, ports=0x%2x, 0x%2x\n", @@ -1084,7 +1085,7 @@ ec_ecdt->data_addr = ecdt_ptr->ec_data; ec_ecdt->gpe_bit = ecdt_ptr->gpe_bit; /* use the GL just to be safe */ - ec_ecdt->global_lock = TRUE; + ec_ecdt->global_lock = 0; ec_ecdt->uid = ecdt_ptr->uid; status = acpi_get_handle(NULL, ecdt_ptr->ec_id, &ec_ecdt->handle);
rafael@albercik:~> cat /proc/acpi/embedded_controller/EC0/info gpe bit: 0x25 ports: 0x66, 0x62 use global lock: no
Then, try this debug patch. --- ec.c 2005-07-06 14:35:55.000000000 +0800 +++ ec.c.1 2005-07-06 14:42:32.000000000 +0800 @@ -280,14 +280,6 @@ if (ec->global_lock) acpi_release_global_lock(glk); - if(atomic_read(&ec->leaving_burst) == 2){ - ACPI_DEBUG_PRINT((ACPI_DB_INFO,"aborted, retry ...\n")); - while(atomic_read(&ec->pending_gpe)){ - msleep(1); - } - acpi_enable_gpe(NULL, ec->gpe_bit, ACPI_NOT_ISR); - goto retry; - } return_VALUE(status); } @@ -363,14 +355,6 @@ if (ec->global_lock) acpi_release_global_lock(glk); - if(atomic_read(&ec->leaving_burst) == 2){ - ACPI_DEBUG_PRINT((ACPI_DB_INFO,"aborted, retry ...\n")); - while(atomic_read(&ec->pending_gpe)){ - msleep(1); - } - acpi_enable_gpe(NULL, ec->gpe_bit, ACPI_NOT_ISR); - goto retry; - } return_VALUE(status); } @@ -466,11 +450,6 @@ if (ec->global_lock) acpi_release_global_lock(glk); - if(atomic_read(&ec->leaving_burst) == 2){ - ACPI_DEBUG_PRINT((ACPI_DB_INFO,"aborted, retry ...\n")); - acpi_enable_gpe(NULL, ec->gpe_bit, ACPI_NOT_ISR); - status = -ENODATA; - } return_VALUE(status); }
Created attachment 5283 [details] dmesg output from 2.6.13-rc1-mm1 w/ debug patch w/ ACPI debug Done. The behavior has not changed but the messages in dmesg are different (attached).
Please attach the output of acpidmp.
Created attachment 5284 [details] acpidmp output from 2.6.13-rc1-mm1 w/ debug patch
To track down the problem, I want to see the results with the following chanages that remove ec query. --- ec.c 2005-07-06 14:35:55.000000000 +0800 +++ ec.c.2 2005-07-06 22:18:51.000000000 +0800 @@ -497,18 +497,6 @@ ACPI_FUNCTION_TRACE("acpi_ec_gpe_query"); - if (acpi_ec_read_status(ec) & ACPI_EC_FLAG_SCI) - result = acpi_ec_query(ec, &value); - - if (result) - goto end; - - object_name[2] = hex[((value >> 4) & 0x0F)]; - object_name[3] = hex[(value & 0x0F)]; - - ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Evaluating %s\n", object_name)); - - acpi_evaluate_object(ec->handle, object_name, NULL, NULL); end: atomic_dec(&ec->pending_gpe); return; ~
Done. The behavior has not changed, the output of dmesg has not changed (ie. see Comment #11).
Hmm, strange, Let's measure acpi_ec_read and acpi_ec_write : --- ec.c 2005-07-06 23:32:10.000000000 +0800 +++ ec.c.3 2005-07-06 23:33:26.000000000 +0800 @@ -65,6 +65,7 @@ #define ACPI_EC_BURST_DISABLE 0x83 #define ACPI_EC_COMMAND_QUERY 0x84 +extern u64 jiffies; static int acpi_ec_add (struct acpi_device *device); static int acpi_ec_remove (struct acpi_device *device, int type); static int acpi_ec_start (struct acpi_device *device); @@ -234,6 +235,7 @@ u32 glk; ACPI_FUNCTION_TRACE("acpi_ec_read"); + printk("...jiffies = %lu \n", jiffies); if (!ec || !data) return_VALUE(-EINVAL); @@ -289,6 +291,7 @@ goto retry; } + printk("...jiffies = %lu \n", jiffies); return_VALUE(status); } @@ -305,6 +308,7 @@ ACPI_FUNCTION_TRACE("acpi_ec_write"); + printk("...jiffies = %lu \n", jiffies); if (!ec) return_VALUE(-EINVAL); retry: @@ -372,6 +376,7 @@ goto retry; } + printk("...jiffies = %lu \n", jiffies); return_VALUE(status); }
Created attachment 5291 [details] dmesg output from 2.6.13-rc1-mm1 w/ timing patch I have removed the declaration of jiffies, because the kernel did not compile with it. Additionally, I've figured out that I should revert the first debug patch. The resulting dmesg output is attached.
Thanks, to make the problem more clearly, could you build kernel with ACPI debug option on, and echo -n 0x00100000 > /proc/acpi/debug_layer echo -n 0xffffffff > /proc/acpi/debug_level Then, I could spot bottleneck.
Created attachment 5293 [details] dmesg output from 2.6.13-rc1-mm1 w/ timing patch w/ debug It is complied with the ACPI debug, but I forgot the "echo" things last time. The (full) dmesg output is attached.
> From log, I found there is a long delay between EC write and read > operations. It is about 0.6s. > So, it is possible to take 15s to finish all EC operations for a battery > status query. > Here are some snippets : > > ...jiffies = 4295177380 > acpi_ec-0379 [111E] [24] acpi_ec_write : ----Exit- > 0000000000000000 > acpi_ec-0664 [111E] [23] acpi_ec_space_handler : ----Exit- > 0000000000000000 > acpi_ec-0608 [111E] [22] acpi_ec_space_handler : ----Entry > acpi_ec-0236 [111E] [23] acpi_ec_read : ----Entry > ...jiffies = 4295178019 > > ....................... > ...................... > ................. > > ...jiffies = 4295178105 > acpi_ec-0379 [111E] [24] acpi_ec_write : ----Exit- > 0000000000000000 > acpi_ec-0664 [111E] [23] acpi_ec_space_handler : ----Exit- > 0000000000000000 > acpi_ec-0608 [111E] [22] acpi_ec_space_handler : ----Entry > acpi_ec-0236 [111E] [23] acpi_ec_read : ----Entry > ...jiffies = 4295178727
> To debug further, Could you try this debug patch to skip all > Acquire/Release statement in AML interpreter. And retest. > I expect to see less delay. I want to see these debug info: 1.#time cat state 2.# dmesg like comment#19 --- osl.c 2005-05-07 13:20:31.000000000 +0800 +++ osl.c.1 2005-07-08 09:26:09.000000000 +0800 @@ -792,11 +792,6 @@ ACPI_DEBUG_PRINT ((ACPI_DB_MUTEX, "Acquiring spinlock[%p] from %s level\n", handle, ((flags & ACPI_NOT_ISR) ? "non-interrupt" : "interrupt"))); - if (flags & ACPI_NOT_ISR) - ACPI_DISABLE_IRQS(); - - spin_lock((spinlock_t *)handle); - return_VOID; } @@ -814,11 +809,6 @@ ACPI_DEBUG_PRINT ((ACPI_DB_MUTEX, "Releasing spinlock[%p] from %s level\n", handle, ((flags & ACPI_NOT_ISR) ? "non-interrupt" : "interrupt"))); - spin_unlock((spinlock_t *)handle); - - if (flags & ACPI_NOT_ISR) - ACPI_ENABLE_IRQS(); - return_VOID; }
Created attachment 5302 [details] dmesg output from 2.6.13-rc1-mm1 w/ timing patch w/ debug rafael@albercik:~> cd /proc/acpi/battery/BAT0/ rafael@albercik:/proc/acpi/battery/BAT0> time cat state present: yes capacity state: ok charging state: charged present rate: unknown remaining capacity: unknown present voltage: unknown real 0m27.630s user 0m0.004s sys 0m4.212s
I'm sorry, I made a mistake with my last debug patch. Could you test this one: --- osl.c 2005-07-06 14:35:55.000000000 +0800 +++ osl.c.1 2005-07-11 13:36:16.000000000 +0800 @@ -899,6 +899,7 @@ int ret = 0; ACPI_FUNCTION_TRACE ("os_wait_semaphore"); +#if 0 if (!sem || (units < 1)) return_ACPI_STATUS (AE_BAD_PARAMETER); @@ -963,6 +964,7 @@ else { ACPI_DEBUG_PRINT ((ACPI_DB_MUTEX, "Acquired semaphore[%p|%d|%d] \n", handle, units, timeout)); } +#endif return_ACPI_STATUS (status); } @@ -981,6 +983,7 @@ ACPI_FUNCTION_TRACE ("os_signal_semaphore"); +#if 0 if (!sem || (units < 1)) return_ACPI_STATUS (AE_BAD_PARAMETER); @@ -990,6 +993,7 @@ ACPI_DEBUG_PRINT ((ACPI_DB_MUTEX, "Signaling semaphore[%p|%d]\n", handle, units)); up(sem); +#endif return_ACPI_STATUS (AE_OK); } ~ ~
Created attachment 5309 [details] dmesg output from 2.6.13-rc1-mm1 w/ timing patch + debug patch Done. Now I've got some nice Oopses and a segfault during reboot: rafael@albercik:~> cd /proc/acpi/battery/BAT0/ rafael@albercik:/proc/acpi/battery/BAT0> time cat state Killed real 0m0.472s user 0m0.000s sys 0m0.012s rafael@albercik:/proc/acpi/battery/BAT0> Message from syslogd@albercik at Mon Jul 11 23:20:39 2005 ... albercik kernel: Oops: 0000 [3] PREEMPT Message from syslogd@albercik at Mon Jul 11 23:20:40 2005 ... albercik kernel: CR2: 0000000000001008 The output of dmesg is attached.
Did you config PREEMPT ? If yes, Please try disable it with a clean kernel and retest.
Created attachment 5313 [details] dmesg output from 2.6.13-rc1-mm1 w/ timing patch + debug patch w/o preemption With CONFIG_PREEMPT_NONE=y rafael@albercik:~> cd /proc/acpi/battery/BAT0/ rafael@albercik:/proc/acpi/battery/BAT0> time cat state present: yes capacity state: ok charging state: charged present rate: unknown remaining capacity: unknown present voltage: unknown real 0m18.840s user 0m0.000s sys 0m6.340s
The problem has made it to mainline, as it is presend in 2.6.13-rc3.
Created attachment 5327 [details] patch against the disassembly of DSDT Please try apply this patch to DSDT, and override DSDT, and test kernel w/ timing patch.
Created attachment 5328 [details] override DSDT Or, you can just include this file to override the original DSDT, and test with timing patch.
Created attachment 5331 [details] dmesg output from 2.6.13-rc1-mm1 w/ override DSDT + timing patch I used the DSDT from Comment #29. This time the timing is much better but the rate, capacity and voltage of the battery are still not recognized: rafael@albercik:~> cd /proc/acpi/battery/BAT0/ rafael@albercik:/proc/acpi/battery/BAT0> time cat state present: yes capacity state: ok charging state: charged present rate: unknown remaining capacity: unknown present voltage: unknown real 0m0.515s user 0m0.004s sys 0m0.500s
Created attachment 5333 [details] override DSDT -take 2 Please try this override DSDT -- take 2.
Created attachment 5334 [details] dmesg output from 2.6.13-rc1-mm1 w/ override DSDT -take 2 + timing patch It seems to be better, but still not right. For the system plugged-in: rafael@albercik:~> cd /proc/acpi/battery/BAT0/ rafael@albercik:/proc/acpi/battery/BAT0> time cat state present: yes capacity state: critical charging state: charged present rate: 0 mA remaining capacity: -14 mAh present voltage: unknown real 0m0.600s user 0m0.000s sys 0m0.580s Do you need the numbers I get without the EC burst mode patch?
Created attachment 5366 [details] patch for falling back to polling mode with kernel option : ec_polling The patch is against linux-2.6.13-rc3. With this patch, if you use kernel boot option ec_polling, the ec driver will fall back to polling mode, which suppose to work on L5D, otherwise, ec driver will use burst mode. Hopefully it will workaround your problem. The real fix is under-going. Ps. I have tested it on my T42. Both mode seems to work. Please tested it. Thanks Luming
Created attachment 5368 [details] ec_polling patch against vanilla 2.6.13-rc3 The patch works for me. :-) Thanks a lot for it! Initially it didn't apply to vanilla 2.6.13-rc3, so I've remade it a little (the resulting patch is attached). Greets, Rafael
applied to-linus
Created attachment 5497 [details] patch vs 2.6.13-rc5 This patch makes polling mode the default again. use "ec_burst=1" if you'd like to try burst mode, but it has enough problems that it will not be the default till they're addressed. This patch went into linus's tree after 2.6.13-rc5 -- closing.
As a follow up about bust-mode, could you try debug patch filed at bug 3851
Referring to Comment #37: Sorry for the delay. I have tested the patch at: http://bugzilla.kernel.org/attachment.cgi?id=5506&action=view I applied it to 2.6.13-rc5, booted the kernel without the ec_polling parameter, and it worked.
Hi, I just updated the patch at bug 3851, please help testing. Thanks, Luming
I'm currently testing the patch at: http://bugzilla.kernel.org/attachment.cgi?id=5574&action=view I have booted with ec_burst=1. It generally works, but I get "present rate" different to zero and the system is on AC power for a couple of hours.
Referring to Comment #40: Now "present rate" is zero. It must have been a fluctuation then.
*** Bug 1093 has been marked as a duplicate of this bug. ***
*** Bug 4793 has been marked as a duplicate of this bug. ***
*** Bug 4911 has been marked as a duplicate of this bug. ***
*** Bug 4846 has been marked as a duplicate of this bug. ***
*** Bug 5100 has been marked as a duplicate of this bug. ***