Bug 5008 - TP 600X os_wait_semaphore errors from setting polling frequency
TP 600X os_wait_semaphore errors from setting polling frequency
Status: REJECTED DUPLICATE of bug 5121
Product: ACPI
Classification: Unclassified
Component: Power-Thermal
i386 Linux
: P2 normal
Assigned To: Luming Yu
:
: 4967 (view as bug list)
Depends on:
Blocks: 5062
  Show dependency treegraph
 
Reported: 2005-08-06 01:09 UTC by Sanjoy Mahajan
Modified: 2005-09-15 01:33 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.13-rc6
Tree: Mainline
Regression: ---


Attachments
C program to generate semaphore error (383 bytes, text/x-csrc)
2005-08-06 01:11 UTC, Sanjoy Mahajan
Details
debugging output for 'time acpi' with ec_burst=1 (78.42 KB, text/x-log)
2005-08-12 08:37 UTC, Sanjoy Mahajan
Details
debug msgs for 'time acpi' (811.26 KB, text/x-log)
2005-08-12 10:42 UTC, Sanjoy Mahajan
Details
debug msgs for 'time acpi' (2nd try) (104.12 KB, text/x-log)
2005-08-12 21:17 UTC, Sanjoy Mahajan
Details

Description Sanjoy Mahajan 2005-08-06 01:09:20 UTC
Distribution: Debian testing
Hardware Environment: TP 600X
Software Environment:
Problem Description: Running the attached C program will generate these errors,
as well as drive the load up to 5 or 6:

  osl-0965 [2790] os_wait_semaphore     : Failed to acquire
semaphore[e3fce600|1|0], AE_TIME

The errors seem correlated with 'acpi -t' hanging for a fraction of a second.

The attached program repeatedly sets the polling frequency to 10, as fast as it
can.  It was the most reliable way to regenerate the semaphore errors, though
they often happen using acpi or acpi -t, or when waking up from S3.  See Bug
#4967 (which also has an acpidmp).

Steps to reproduce:
Comment 1 Sanjoy Mahajan 2005-08-06 01:11:54 UTC
Created attachment 5522 [details]
C program to generate semaphore error

THM2 happens to be the name of one of the thermal zones in the TP600X (it has
THM0, THM2, THM6, THM7).  Probably different in other laptops.
Comment 2 Sanjoy Mahajan 2005-08-09 15:47:51 UTC
It happens with -rc6 too, so I changed the kernel version accordingly.

With -rc6, once the semaphore errors show up, acpi can hang for a few seconds.
Here's one of the worse examples:

$ time acpi
     Battery 1: charged, 86%
     Battery 2: discharging, 68%, 00:54:38 remaining

real	0m3.428s
user	0m0.001s
sys	0m1.594s

The system time is a lot, almost 50% of the actual time (spinlocking on a
semaphore?).
Comment 3 Luming Yu 2005-08-12 00:36:59 UTC
Please try linux-2.6.13-rc6 with patch filed at bug 3851. Pleast test it with  
ec_burst=1 and ec_burst=0.   
Comment 4 Sanjoy Mahajan 2005-08-12 07:16:10 UTC
I get the semaphore errors even with that patch, booting either with no ec_burst
option (which should be like ec_burst=0 ?) or with ec_burst=1.  (This is with
the preemptible kernel.)
Comment 5 Luming Yu 2005-08-12 07:18:27 UTC
This AE_TIME error is just a warning. Please ignore it.
what about "time acpi"?
Comment 6 Sanjoy Mahajan 2005-08-12 07:22:34 UTC
acpi still takes a while.  Here's a typical 'time acpi' (similar for
no ec_burst or for ec_burst=1):

$ time acpi
     Battery 1: charged, 100%
     Battery 2: charged, 98%

real    0m1.390s
user    0m0.000s
sys     0m1.360s

-Sanjoy

Comment 7 Luming Yu 2005-08-12 07:25:06 UTC
please do:

echo 0x10 > debug_layer
echo 0x10 > debug_level

and send out dmesg after run "time acpi"
Comment 8 Sanjoy Mahajan 2005-08-12 07:29:54 UTC
> send out dmesg after run "time acpi"

Even with those debug values, that produced no dmesgs.

-Sanjoy

Comment 9 Luming Yu 2005-08-12 07:36:36 UTC
echo -n 0x4100000 > debug_layer
echo -n 0xffffffff > debug_level
Comment 10 Sanjoy Mahajan 2005-08-12 07:46:19 UTC
Hmm, acpi debugging seems a bit broken on this kernel, which may be
why I'm not getting any dmesgs.  No matter what I echo to debug_layer
or debug_level, the values stay the same:

# cat /proc/acpi/debug_layer
Description                     Hex        SET
ACPI_UTILITIES                  0x00000001 [*]
ACPI_HARDWARE                   0x00000002 [*]
ACPI_EVENTS                     0x00000004 [*]
ACPI_TABLES                     0x00000008 [*]
ACPI_NAMESPACE                  0x00000010 [*]
ACPI_PARSER                     0x00000020 [*]
ACPI_DISPATCHER                 0x00000040 [*]
ACPI_EXECUTER                   0x00000080 [*]
ACPI_RESOURCES                  0x00000100 [*]
ACPI_CA_DEBUGGER                0x00000200 [*]
ACPI_OS_SERVICES                0x00000400 [*]
ACPI_CA_DISASSEMBLER            0x00000800 [*]
ACPI_COMPILER                   0x00001000 [*]
ACPI_TOOLS                      0x00002000 [*]
ACPI_ALL_DRIVERS                0xFFFF0000 [*]
--
debug_layer = 0xFFFF3FFF (* = enabled, - = partial)

# cat /proc/acpi/debug_level
Description                     Hex        SET
ACPI_LV_ERROR                   0x00000001 [*]
ACPI_LV_WARN                    0x00000002 [*]
ACPI_LV_INIT                    0x00000004 [*]
ACPI_LV_DEBUG_OBJECT            0x00000008 [*]
ACPI_LV_INFO                    0x00000010 [ ]
ACPI_LV_INIT_NAMES              0x00000020 [ ]
ACPI_LV_PARSE                   0x00000040 [ ]
ACPI_LV_LOAD                    0x00000080 [ ]
ACPI_LV_DISPATCH                0x00000100 [ ]
ACPI_LV_EXEC                    0x00000200 [ ]
ACPI_LV_NAMES                   0x00000400 [ ]
ACPI_LV_OPREGION                0x00000800 [ ]
ACPI_LV_BFIELD                  0x00001000 [ ]
ACPI_LV_TABLES                  0x00002000 [ ]
ACPI_LV_VALUES                  0x00004000 [ ]
ACPI_LV_OBJECTS                 0x00008000 [ ]
ACPI_LV_RESOURCES               0x00010000 [ ]
ACPI_LV_USER_REQUESTS           0x00020000 [ ]
ACPI_LV_PACKAGE                 0x00040000 [ ]
ACPI_LV_ALLOCATIONS             0x00100000 [ ]
ACPI_LV_FUNCTIONS               0x00200000 [ ]
ACPI_LV_OPTIMIZATIONS           0x00400000 [ ]
ACPI_LV_MUTEX                   0x01000000 [ ]
ACPI_LV_THREADS                 0x02000000 [ ] # 

In the debug_level output, it doesn't tell me the current state (it
finishes the 'cat' prematurely and then gives me back my shell prompt,
the '#' at the end of the last line of the output).  I've noticed that
problem a few times before, but I never had a problem changing the
debug values.

Here is /proc/cmdline:

BOOT_IMAGE=+three-patches ro root=305 idebus=66 apm=off acpi=force
pci=noacpi console=ttyS0,115200 console=tty0 acpi_sleep=s3_bios ec_burst=1

The 'three patches' in the BOOT_IMAGE are the S3 sleep hang workaround
from shaohua.li@intel.com for bug #5037 (didn't work but that's
another story), the specific-hotkey patch to deal with bug #5035, and
the ec.c patch to deal with bug #5008 and #5049.

-Sanjoy

Comment 11 Sanjoy Mahajan 2005-08-12 07:49:15 UTC
Oh, I just found the problem (which may be another bug):

echo -n 0x10 > /proc/acpi/debug_level

does NOT change the settings, whereas

echo 0x10 > /proc/acpi/debug_level

works.

Maybe some sscanf assuming there's a return at the end?

I'll now redo the acpi test and send a dmesg.

-Sanjoy

Comment 12 Luming Yu 2005-08-12 07:55:10 UTC
Strange, It works for me. My version is 2.6.13-rc5.
Comment 13 Sanjoy Mahajan 2005-08-12 07:59:22 UTC
And now it's working for me again (except for the debug_level output
being truncated, but that's a longstanding problem for me).  It's very
strange.  Anyway back to the dmesgs...

-Sanjoy

Comment 14 Sanjoy Mahajan 2005-08-12 08:00:40 UTC
With 0x10 as the debug_level and debug_layer:

Execute Method: [\_SB_.PCI0.ISA0.EC0_.BAT1._BST] (Node c156f248)
Execute Method: [\_SB_.PCI0.ISA0.EC0_.BAT1._BIF] (Node c156f288)
Execute Method: [\_SB_.PCI0.ISA0.EC0_.BAT0._BST] (Node c156f608)
Execute Method: [\_SB_.PCI0.ISA0.EC0_.BAT0._BIF] (Node c156f648)

-Sanjoy

Comment 15 Luming Yu 2005-08-12 08:07:38 UTC
Please try 
echo 0x100000 > debug_layer 
echo 0xffffffff > debug_level

And post dmesg
Comment 16 Sanjoy Mahajan 2005-08-12 08:10:02 UTC
> Please try 
> echo 0x100000 > debug_layer 
> echo 0xffffffff > debug_level

Can't right now: Now the echo -n is not working again, and neither is
echo (which probably wasn't working before, it's just that changing
the setting started working between my testing 'echo -n' and 'echo').
Notice how in the transcript below debug_layer stays at 0x10 no matter
what:

# cat /proc/acpi/debug_layer
Description			Hex        SET
ACPI_UTILITIES			0x00000001 [ ]
ACPI_HARDWARE			0x00000002 [ ]
ACPI_EVENTS			0x00000004 [ ]
ACPI_TABLES			0x00000008 [ ]
ACPI_NAMESPACE			0x00000010 [*]
ACPI_PARSER			0x00000020 [ ]
ACPI_DISPATCHER			0x00000040 [ ]
ACPI_EXECUTER			0x00000080 [ ]
ACPI_RESOURCES			0x00000100 [ ]
ACPI_CA_DEBUGGER		0x00000200 [ ]
ACPI_OS_SERVICES		0x00000400 [ ]
ACPI_CA_DISASSEMBLER		0x00000800 [ ]
ACPI_COMPILER			0x00001000 [ ]
ACPI_TOOLS			0x00002000 [ ]
ACPI_ALL_DRIVERS		0xFFFF0000 [ ]
--
debug_layer = 0x00000010 (* = enabled, - = partial)
# echo -n 0x4100000 > debug_layer
# cat /proc/acpi/debug_layer
Description			Hex        SET
ACPI_UTILITIES			0x00000001 [ ]
ACPI_HARDWARE			0x00000002 [ ]
ACPI_EVENTS			0x00000004 [ ]
ACPI_TABLES			0x00000008 [ ]
ACPI_NAMESPACE			0x00000010 [*]
ACPI_PARSER			0x00000020 [ ]
ACPI_DISPATCHER			0x00000040 [ ]
ACPI_EXECUTER			0x00000080 [ ]
ACPI_RESOURCES			0x00000100 [ ]
ACPI_CA_DEBUGGER		0x00000200 [ ]
ACPI_OS_SERVICES		0x00000400 [ ]
ACPI_CA_DISASSEMBLER		0x00000800 [ ]
ACPI_COMPILER			0x00001000 [ ]
ACPI_TOOLS			0x00002000 [ ]
ACPI_ALL_DRIVERS		0xFFFF0000 [ ]
--
debug_layer = 0x00000010 (* = enabled, - = partial)
# echo 0x4100000 > debug_layer
# cat /proc/acpi/debug_layer
Description                     Hex        SET
ACPI_UTILITIES                  0x00000001 [ ]
ACPI_HARDWARE                   0x00000002 [ ]
ACPI_EVENTS                     0x00000004 [ ]
ACPI_TABLES                     0x00000008 [ ]
ACPI_NAMESPACE                  0x00000010 [*]
ACPI_PARSER                     0x00000020 [ ]
ACPI_DISPATCHER                 0x00000040 [ ]
ACPI_EXECUTER                   0x00000080 [ ]
ACPI_RESOURCES                  0x00000100 [ ]
ACPI_CA_DEBUGGER                0x00000200 [ ]
ACPI_OS_SERVICES                0x00000400 [ ]
ACPI_CA_DISASSEMBLER            0x00000800 [ ]
ACPI_COMPILER                   0x00001000 [ ]
ACPI_TOOLS                      0x00002000 [ ]
ACPI_ALL_DRIVERS                0xFFFF0000 [ ]
--
debug_layer = 0x00000010 (* = enabled, - = partial)

Comment 17 Luming Yu 2005-08-12 08:13:43 UTC
Please file another bug for debug issue.
Thanks.
Comment 18 Sanjoy Mahajan 2005-08-12 08:34:06 UTC
It's Bug #5054
Comment 19 Sanjoy Mahajan 2005-08-12 08:37:53 UTC
Created attachment 5619 [details]
debugging output for 'time acpi' with ec_burst=1

Setting debug_level and debug_layer started working again, so I took the
opportunity to get this debug output from running 'time acpi' (which took 13
seconds with all the debugging generated).  I used this script (I set the
thermal polling frequencies to 0 before running it so that those msgs wouldn't
clutter the debug log, and then set them back to 10 at the end with the
'quiet-fan' script):

#!/bin/bash
echo 0x100000 > debug_layer
echo 0xffffffff > debug_level
cat /proc/acpi/debug_layer
cat /proc/acpi/debug_level
sleep 1
logger BEGIN ACPI DEBUGGING
time acpi
logger END ACPI DEBUGGING
echo 0x10 > debug_layer
echo 0x10 > debug_level
quiet-fan
Comment 20 Luming Yu 2005-08-12 09:00:33 UTC
Please try this to collect timing info along with debug_layer=0x00100000 
debug_level=0xffffffff

--- ec.c.0      2005-08-12 23:53:17.000000000 +0800
+++ ec.c        2005-08-12 23:56:34.000000000 +0800
@@ -328,6 +328,7 @@
        u8                      address,
        u32                     *data)
 {
+        printk("...jiffies = %lu \n", jiffies);
        if (acpi_ec_polling_mode)
                return acpi_ec_polling_read(ec, address, data);
        else
@@ -339,6 +340,7 @@
        u8                      address,
        u8                      data)
 {
+        printk("...jiffies = %lu \n", jiffies);
        if (acpi_ec_polling_mode)
                return acpi_ec_polling_write(ec, address, data);
        else
@@ -607,6 +609,7 @@
        union acpi_ec           *ec,
        u32                     *data)
 {
+        printk("...jiffies = %lu \n", jiffies);
        if (acpi_ec_polling_mode)
                return acpi_ec_polling_query(ec, data);
        else
Comment 21 Sanjoy Mahajan 2005-08-12 10:42:29 UTC
Created attachment 5620 [details]
debug msgs for 'time acpi'

acpi debugging is working fine -- my stupdity (see Bug #5054).	The attached
log file is from running 'time acpi' with debug_layer=0x00100000 and
debug_level=0xffffffff and with the ec jiffies patch.
Comment 22 Luming Yu 2005-08-12 20:36:12 UTC
Hi, the log file is not with debug_layer = 0x100000
Please retest. thanks.
Comment 23 Sanjoy Mahajan 2005-08-12 21:17:57 UTC
Created attachment 5622 [details]
debug msgs for 'time acpi' (2nd try)

Sorry, my mistake.  Here's another attempt.  This time it should be right.  The
script I ran to collect the debugging said this before running 'time acpi',
which took 13 seconds (real and system) with all the debugging:

debug_layer = 0x00100000 (* = enabled, - = partial)
Comment 24 Luming Yu 2005-09-06 01:12:07 UTC
From the log, The jiffes intervals between two sequential ec operations  
(acpi_ec_read , acpi_ec_write)is about 40. And the log doesn't show anything  
abnormal. The root cause of the slowness in accessing battery status could be 
releated to the mutex--I2CM. 
 
Or, this is not acpi driver bug. 
Comment 25 Konstantin Karasyov 2005-09-06 04:14:57 UTC
*** Bug 4967 has been marked as a duplicate of this bug. ***
Comment 26 Luming Yu 2005-09-15 00:25:45 UTC
Please build kernel 2.6.14-rc1 with acpi debug option disabled.
boot kernel with ec_burst=1. 
Comment 27 Luming Yu 2005-09-15 01:17:08 UTC
I tested on a sony T26 with kernel of acpi debug option enabled , and kernel  
of acpi debug option disabled.  
  
acpi debug option enabled:  
real: 0m0.013s  
user: 0m0.000s  
sys:  0m0.012s  
  
acpi debug option disabled:  
real: 0m0.003s  
user: 0m0.000s  
sys:  0m0.000s  
  
So, I believe you should disable acpi debug option.   
If it is still a issue, please reopen.  
  
Thanks.  
  
	  
Comment 28 Luming Yu 2005-09-15 01:33:38 UTC

*** This bug has been marked as a duplicate of 5121 ***
Comment 29 Sanjoy Mahajan 2005-11-02 18:41:02 UTC
> So, I believe you should disable acpi debug option.   

I just started running 2.6.14 (with acpi debugging) and it is much
faster than 2.6.13 (also with acpi debugging).  With 2.6.14:

real    0m0.041s
user    0m0.004s
sys     0m0.020s

Which is more than fast enough.  Whereas with 2.6.13, it would often
hang for half a second or more.  So I don't think acpi debugging was
the problem, but I also don't know how 2.6.14 improved the situation.


Note You need to log in before you can comment on or make changes to this bug.