Bug 1669

Summary: ACPI global lock macros
Product: ACPI Reporter: Luming Yu (luming.yu)
Component: Power-OtherAssignee: Luming Yu (luming.yu)
Severity: normal CC: acpi-bugzilla
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.4 2.6 Subsystem:
Regression: --- Bisected commit-id:
Attachments: a patch for fixing this issue on i386 platform
c version
x86_64 version

Description Luming Yu 2003-12-10 23:18:07 UTC
Paul Menage [menage@google.com] said:

The ACPI_ACQUIRE_GLOBAL_LOCK() macro in include/asm-i386/acpi.h looks a 
little odd:

     do { \
         int dummy; \
         asm("1:     movl (%1),%%eax;" \
             "movl   %%eax,%%edx;" \
             "andl   %2,%%edx;" \
             "btsl   $0x1,%%edx;" \
             "adcl   $0x0,%%edx;" \
             "lock;  cmpxchgl %%edx,(%1);" \
             "jnz    1b;" \
             "cmpb   $0x3,%%dl;" \
             "sbbl   %%eax,%%eax" \
             :"=a"(Acq),"=c"(dummy):"c"(GLptr),"i"(~1L):"dx"); \
     } while(0)

When compiled, it results in:

  266:   mov    0x0,%ecx
                         268: R_386_32   acpi_gbl_common_fACS
  26c:   mov    (%ecx),%eax
  26e:   mov    %eax,%edx
  270:   and    %ecx,%edx
  272:   bts    $0x1,%edx
  276:   adc    $0x0,%edx
  279:   lock cmpxchg %edx,(%ecx)
  27d:   jne    26c <acpi_ev_acquire_global_lock+0x2f>
  27f:   cmp    $0x3,%dl
  282:   sbb    %eax,%eax

So at location 270 we mask %edx with %ecx, which is the address of the 
global lock. Unless the global lock is aligned on a 2-byte but not 
4-byte boundary, which seems a little unlikely, then this is going to 
clear both the owned and the pending bits in %edx, so we'll always think 
that the lock is not owned. Shouldn't the andl be masking with %3 (which 
is initialised as ~1) rather than %2 (the address of the lock)?

Given the comments above the definition, I'm guessing that the "dummy" 
parameter was added later for some reason (to tell gcc that ecx would 
get clobbered? - but it doesn't seem to be clobbered), and the parameter 
substitutions in the asm weren't updated. Unless I'm missing something 
fundamental, shouldn't the definition be something more like this:

     do { \
         asm volatile("1:movl   (%1),%%eax;" \
             "movl   %%eax,%%edx;" \
             "andl   %2,%%edx;" \
             "btsl   $0x1,%%edx;" \
             "adcl   $0x0,%%edx;" \
             "lock;  cmpxchgl %%edx,(%1);" \
             "jnz    1b;" \
             "cmpb   $0x3,%%dl;" \
             "sbbl   %0,%0" \
             :"=r"(Acq):"r"(GLptr),"i"(~1L):"dx", "ax"); \
     } while(0)

which compiles to:

  2e5:   mov    0x0,%ecx
                         2e7: R_386_32   acpi_gbl_common_fACS
  2eb:   mov    (%ecx),%eax
  2ed:   mov    %eax,%edx
  2ef:   and    $0xfffffffe,%edx
  2f2:   bts    $0x1,%edx
  2f6:   adc    $0x0,%edx
  2f9:   lock cmpxchg %edx,(%ecx)
  2fd:   jne    2eb <acpi_ev_acquire_global_lock+0x37>
  2ff:   cmp    $0x3,%dl
  302:   sbb    %cl,%cl

which is identical to the ACPI spec reference implementation, apart from 
returning the result in %cl rather than %al (since we're cleanly 
separating clobbered registers from input/output params, and letting gcc 
choose the param registers).

Alternatively it could be defined in C (as in ia64) which would reduce 
the likelihood of asm bugs. (Although it wouldn't be safe to use 
__cmpxchg(), as that uses LOCK_PREFIX which is empty on UP, rather than 
an explicit "lock").

ACPI_RELEASE_GLOBAL_LOCK(), and the x86_64 variants of these, seem to 
have similar issues.

Comment 1 Luming Yu 2003-12-10 23:21:39 UTC
Created attachment 1658 [details]
a patch for fixing this issue on i386 platform

This patch is based on paul 's proposal patch, and correct one error in that
Comment 2 Len Brown 2004-02-23 12:46:41 UTC
The existing code is definately incorrect, and will result in ACPI_ACQUIRE_GLOBAL_LOCK() to 
always succeed -- even if the lock was held. 
The patch assembles correctly and matches the ACPI spec. 
But Luming, I'm curious why you didn't include Paul's suggestion of letting gcc choose the output 
register and instead left it hard-coded to eax? eg. 
             "sbbl   %0,%0" \ 
             :"=r"(Acq) ... 
I've decoded most of the asm syntax, but still don't understand what the last parameter means.  
-  ...:"dx"); 
+ ...:"dx", "ax"); 
Is it _really_ impossible to do this in C?  ia64 and x86_64 seem to be able to pull it off... 
Comment 3 Luming Yu 2004-02-23 22:08:17 UTC
Patch filed here is not the latest one, please take a look at
Comment 4 Luming Yu 2004-02-23 22:50:43 UTC

  Regarding hardcode register in Inline assembly code, I just prefer 
programmer has more control over register allocation. Otherwise, there could 
be some potential bugs introduced by GCC.

  Regarding why not use C code, I think of Inline assembly code is unable to 
get removed cleanly. Because we need instructions like "lock; cmpxchgl.."

  Of cause, we can minimalize the usage of Inlie assemly code like IA64.

Comment 5 Len Brown 2004-02-23 23:26:47 UTC
This isn't performance critical -- it is "correctness" critical;-) 
So I'd like to see a solution in C like ia64 and x86_64, with the assembly only used for the exotic 
atomic instructions. 
Comment 6 Luming Yu 2004-02-26 05:01:00 UTC
Created attachment 2238 [details]
c version

This is c version for i386. It is a little bit different with current X86_64
version. I try to copy code from include/asm-x86_64/acpi.h, 

static inline __u32 cmpxchg4_locked(__u32 *ptr, __u32 old, __u32 new)
	asm volatile("lock ; cmpxchgl %k1,%2" :
		     "=r" (new) : "0" (old), "m" (*(__u32 *)ptr) : "memory");
	return new;

static inline int
__acpi_acquire_global_lock (unsigned int *lock)
	unsigned int old, new, val;
	do {
		old = *lock;
		new = (((old & ~0x3) + 2) + ((old >> 1) & 0x1));
		val = cmpxchg4_locked(lock, new, old);
	} while (unlikely (val != old));
	return (new < 3) ? -1 : 0;

but gcc generate the following bad code.

c01fc0d8:	8b 13			mov    (%ebx),%edx
c01fc0da:	89 d0			mov    %edx,%eax
c01fc0dc:	89 d1			mov    %edx,%ecx
c01fc0de:	d1 e8			shr    %eax
c01fc0e0:	83 e1 fc		and    $0xfffffffc,%ecx
c01fc0e3:	83 e0 01		and    $0x1,%eax
c01fc0e6:	8d 4c 08 02		lea    0x2(%eax,%ecx,1),%ecx
c01fc0ea:	89 c8			mov    %ecx,%eax
c01fc0ec:	f0 0f b1 03		lock cmpxchg %eax,(%ebx)
c01fc0f0:	39 d0			cmp    %edx,%eax

Maybe current X86_64 version is wrong. We need confirm it using X86_64
Comment 7 Suresh B Siddha 2004-02-26 15:26:23 UTC
> -----Original Message-----
> From: Brown, Len 
> Suresh,
> Any chance you can dis-assemble an invocation of 
> __acpi_acquire_global_lock() on an x86_64 machine -- as built 
> with the native compiler?

Disassembly of x86_64 portion:

ffffffff8023a952:       8b 16                   mov    (%rsi),%edx
ffffffff8023a954:       89 d0                   mov    %edx,%eax
ffffffff8023a956:       89 d1                   mov    %edx,%ecx
ffffffff8023a958:       d1 e8                   shr    %eax
ffffffff8023a95a:       83 e1 fc                and    $0xfffffffffffffffc,%ecx
ffffffff8023a95d:       83 e0 01                and    $0x1,%eax
ffffffff8023a960:       8d 4c 08 02             lea    0x2(%rax,%rcx,1),%ecx
ffffffff8023a964:       89 c8                   mov    %ecx,%eax
ffffffff8023a966:       f0 0f b1 06             lock cmpxchg %eax,(%rsi)
ffffffff8023a96a:       39 d0                   cmp    %edx,%eax
ffffffff8023a96c:       75 e4                   jne    ffffffff8023a952 
ffffffff8023a96e:       83 f9 03                cmp    $0x3,%ecx
ffffffff8023a971:       19 c0                   sbb    %eax,%eax
Comment 8 Luming Yu 2004-02-26 17:07:13 UTC
My question is around here:
ffffffff8023a964:       89 c8                   mov    %ecx,%eax
ffffffff8023a966:       f0 0f b1 06             lock cmpxchg %eax,(%rsi)

According to IA32 instruction reference, eax should hold old value.
Comment 9 Luming Yu 2004-02-26 17:34:33 UTC
Created attachment 2239 [details]
x86_64 version

this is patch for X86_64. Who can help testing?
Comment 10 Suresh B Siddha 2004-02-27 00:04:25 UTC
Ok. It now generates correct code.

ffffffff8023a956:       8b 0e                   mov    (%rsi),%ecx
ffffffff8023a958:       89 c8                   mov    %ecx,%eax
ffffffff8023a95a:       89 ca                   mov    %ecx,%edx
ffffffff8023a95c:       d1 e8                   shr    %eax
ffffffff8023a95e:       83 e2 fc                and    $0xfffffffffffffffc,%edx
ffffffff8023a961:       83 e0 01                and    $0x1,%eax
ffffffff8023a964:       8d 7c 10 02             lea    0x2(%rax,%rdx,1),%edi
ffffffff8023a968:       89 c8                   mov    %ecx,%eax
ffffffff8023a96a:       89 fa                   mov    %edi,%edx
ffffffff8023a96c:       f0 0f b1 16             lock cmpxchg %edx,(%rsi)
ffffffff8023a970:       39 c1                   cmp    %eax,%ecx
ffffffff8023a972:       75 e2                   jne    ffffffff8023a956

And kernel boots fine on my system.
Comment 11 Luming Yu 2004-02-29 22:41:00 UTC
The following is disassembly code from IA64:
a000000100317910:       0b 10 00 22 10 10       [MMI]       ld4 r2=[r17];;
a000000100317916:       b0 e0 0b 58 44 00                   and r11=-4,r2
a00000010031791c:       22 10 00 52                         extr.u r16=r2,1,1;;
a000000100317920:       0a 50 2c 20 00 20       [MMI]       add r10=r11,r16;;
a000000100317926:       00 00 00 02 00 e0                   nop.m 0x0
a00000010031792c:       21 50 00 84                         adds r15=2,r10
a000000100317930:       0b 00 08 40 2a 04       [MMI]       mov.m ar.ccv=r2;;
a000000100317936:       90 78 44 22 20 00                   cmpxchg4.acq
a00000010031793c:       00 00 04 00                         nop.i 0x0;;
a000000100317940:       10 00 00 00 01 00       [MIB]       nop.m 0x0
a000000100317946:       90 48 08 10 71 04                   cmp4.eq p9,p8=r9,r2
a00000010031794c:       d0 ff ff 4b                   (p08) br.cond.dpnt.few
a000000100317910 <acpi_ev_global_lock_handler+0xf0>
Comment 12 Len Brown 2004-04-13 23:40:38 UTC
applied to 2.4.26 and 2.6.5 -- please close.