Paul Menage [menage@google.com] said: The ACPI_ACQUIRE_GLOBAL_LOCK() macro in include/asm-i386/acpi.h looks a little odd: #define ACPI_ACQUIRE_GLOBAL_LOCK(GLptr, Acq) \ do { \ int dummy; \ asm("1: movl (%1),%%eax;" \ "movl %%eax,%%edx;" \ "andl %2,%%edx;" \ "btsl $0x1,%%edx;" \ "adcl $0x0,%%edx;" \ "lock; cmpxchgl %%edx,(%1);" \ "jnz 1b;" \ "cmpb $0x3,%%dl;" \ "sbbl %%eax,%%eax" \ :"=a"(Acq),"=c"(dummy):"c"(GLptr),"i"(~1L):"dx"); \ } while(0) When compiled, it results in: 266: mov 0x0,%ecx 268: R_386_32 acpi_gbl_common_fACS 26c: mov (%ecx),%eax 26e: mov %eax,%edx 270: and %ecx,%edx 272: bts $0x1,%edx 276: adc $0x0,%edx 279: lock cmpxchg %edx,(%ecx) 27d: jne 26c <acpi_ev_acquire_global_lock+0x2f> 27f: cmp $0x3,%dl 282: sbb %eax,%eax So at location 270 we mask %edx with %ecx, which is the address of the global lock. Unless the global lock is aligned on a 2-byte but not 4-byte boundary, which seems a little unlikely, then this is going to clear both the owned and the pending bits in %edx, so we'll always think that the lock is not owned. Shouldn't the andl be masking with %3 (which is initialised as ~1) rather than %2 (the address of the lock)? Given the comments above the definition, I'm guessing that the "dummy" parameter was added later for some reason (to tell gcc that ecx would get clobbered? - but it doesn't seem to be clobbered), and the parameter substitutions in the asm weren't updated. Unless I'm missing something fundamental, shouldn't the definition be something more like this: #define ACPI_ACQUIRE_GLOBAL_LOCK(GLptr, Acq) \ do { \ asm volatile("1:movl (%1),%%eax;" \ "movl %%eax,%%edx;" \ "andl %2,%%edx;" \ "btsl $0x1,%%edx;" \ "adcl $0x0,%%edx;" \ "lock; cmpxchgl %%edx,(%1);" \ "jnz 1b;" \ "cmpb $0x3,%%dl;" \ "sbbl %0,%0" \ :"=r"(Acq):"r"(GLptr),"i"(~1L):"dx", "ax"); \ } while(0) which compiles to: 2e5: mov 0x0,%ecx 2e7: R_386_32 acpi_gbl_common_fACS 2eb: mov (%ecx),%eax 2ed: mov %eax,%edx 2ef: and $0xfffffffe,%edx 2f2: bts $0x1,%edx 2f6: adc $0x0,%edx 2f9: lock cmpxchg %edx,(%ecx) 2fd: jne 2eb <acpi_ev_acquire_global_lock+0x37> 2ff: cmp $0x3,%dl 302: sbb %cl,%cl which is identical to the ACPI spec reference implementation, apart from returning the result in %cl rather than %al (since we're cleanly separating clobbered registers from input/output params, and letting gcc choose the param registers). Alternatively it could be defined in C (as in ia64) which would reduce the likelihood of asm bugs. (Although it wouldn't be safe to use __cmpxchg(), as that uses LOCK_PREFIX which is empty on UP, rather than an explicit "lock"). ACPI_RELEASE_GLOBAL_LOCK(), and the x86_64 variants of these, seem to have similar issues. Paul
Created attachment 1658 [details] a patch for fixing this issue on i386 platform This patch is based on paul 's proposal patch, and correct one error in that proposal.
The existing code is definately incorrect, and will result in ACPI_ACQUIRE_GLOBAL_LOCK() to always succeed -- even if the lock was held. The patch assembles correctly and matches the ACPI spec. But Luming, I'm curious why you didn't include Paul's suggestion of letting gcc choose the output register and instead left it hard-coded to eax? eg. "sbbl %0,%0" \ :"=r"(Acq) ... I've decoded most of the asm syntax, but still don't understand what the last parameter means. ie. - ...:"dx"); + ...:"dx", "ax"); Is it _really_ impossible to do this in C? ia64 and x86_64 seem to be able to pull it off... thanks, -Len
Patch filed here is not the latest one, please take a look at http://bugzilla.kernel.org/show_bug.cgi?id=1781#c10
Len, Regarding hardcode register in Inline assembly code, I just prefer programmer has more control over register allocation. Otherwise, there could be some potential bugs introduced by GCC. Regarding why not use C code, I think of Inline assembly code is unable to get removed cleanly. Because we need instructions like "lock; cmpxchgl.." Of cause, we can minimalize the usage of Inlie assemly code like IA64. Thanks, Luming
Luming, This isn't performance critical -- it is "correctness" critical;-) So I'd like to see a solution in C like ia64 and x86_64, with the assembly only used for the exotic atomic instructions. thanks, -Len
Created attachment 2238 [details] c version This is c version for i386. It is a little bit different with current X86_64 version. I try to copy code from include/asm-x86_64/acpi.h, static inline __u32 cmpxchg4_locked(__u32 *ptr, __u32 old, __u32 new) { asm volatile("lock ; cmpxchgl %k1,%2" : "=r" (new) : "0" (old), "m" (*(__u32 *)ptr) : "memory"); return new; } static inline int __acpi_acquire_global_lock (unsigned int *lock) { unsigned int old, new, val; do { old = *lock; new = (((old & ~0x3) + 2) + ((old >> 1) & 0x1)); val = cmpxchg4_locked(lock, new, old); } while (unlikely (val != old)); return (new < 3) ? -1 : 0; } but gcc generate the following bad code. c01fc0d8: 8b 13 mov (%ebx),%edx c01fc0da: 89 d0 mov %edx,%eax c01fc0dc: 89 d1 mov %edx,%ecx c01fc0de: d1 e8 shr %eax c01fc0e0: 83 e1 fc and $0xfffffffc,%ecx c01fc0e3: 83 e0 01 and $0x1,%eax c01fc0e6: 8d 4c 08 02 lea 0x2(%eax,%ecx,1),%ecx c01fc0ea: 89 c8 mov %ecx,%eax c01fc0ec: f0 0f b1 03 lock cmpxchg %eax,(%ebx) c01fc0f0: 39 d0 cmp %edx,%eax Maybe current X86_64 version is wrong. We need confirm it using X86_64 compiler.
> -----Original Message----- > From: Brown, Len > Suresh, > Any chance you can dis-assemble an invocation of > __acpi_acquire_global_lock() on an x86_64 machine -- as built > with the native compiler? Disassembly of x86_64 portion: ffffffff8023a952: 8b 16 mov (%rsi),%edx ffffffff8023a954: 89 d0 mov %edx,%eax ffffffff8023a956: 89 d1 mov %edx,%ecx ffffffff8023a958: d1 e8 shr %eax ffffffff8023a95a: 83 e1 fc and $0xfffffffffffffffc,%ecx ffffffff8023a95d: 83 e0 01 and $0x1,%eax ffffffff8023a960: 8d 4c 08 02 lea 0x2(%rax,%rcx,1),%ecx ffffffff8023a964: 89 c8 mov %ecx,%eax ffffffff8023a966: f0 0f b1 06 lock cmpxchg %eax,(%rsi) ffffffff8023a96a: 39 d0 cmp %edx,%eax ffffffff8023a96c: 75 e4 jne ffffffff8023a952 <acpi_ev_acquire_global_lock+0x7b> ffffffff8023a96e: 83 f9 03 cmp $0x3,%ecx ffffffff8023a971: 19 c0 sbb %eax,%eax
My question is around here: ffffffff8023a964: 89 c8 mov %ecx,%eax ffffffff8023a966: f0 0f b1 06 lock cmpxchg %eax,(%rsi) According to IA32 instruction reference, eax should hold old value.
Created attachment 2239 [details] x86_64 version this is patch for X86_64. Who can help testing?
Ok. It now generates correct code. ffffffff8023a956: 8b 0e mov (%rsi),%ecx ffffffff8023a958: 89 c8 mov %ecx,%eax ffffffff8023a95a: 89 ca mov %ecx,%edx ffffffff8023a95c: d1 e8 shr %eax ffffffff8023a95e: 83 e2 fc and $0xfffffffffffffffc,%edx ffffffff8023a961: 83 e0 01 and $0x1,%eax ffffffff8023a964: 8d 7c 10 02 lea 0x2(%rax,%rdx,1),%edi ffffffff8023a968: 89 c8 mov %ecx,%eax ffffffff8023a96a: 89 fa mov %edi,%edx ffffffff8023a96c: f0 0f b1 16 lock cmpxchg %edx,(%rsi) ffffffff8023a970: 39 c1 cmp %eax,%ecx ffffffff8023a972: 75 e2 jne ffffffff8023a956 And kernel boots fine on my system.
The following is disassembly code from IA64: a000000100317910: 0b 10 00 22 10 10 [MMI] ld4 r2=[r17];; a000000100317916: b0 e0 0b 58 44 00 and r11=-4,r2 a00000010031791c: 22 10 00 52 extr.u r16=r2,1,1;; a000000100317920: 0a 50 2c 20 00 20 [MMI] add r10=r11,r16;; a000000100317926: 00 00 00 02 00 e0 nop.m 0x0 a00000010031792c: 21 50 00 84 adds r15=2,r10 a000000100317930: 0b 00 08 40 2a 04 [MMI] mov.m ar.ccv=r2;; a000000100317936: 90 78 44 22 20 00 cmpxchg4.acq r9=[r17],r15,ar.ccv a00000010031793c: 00 00 04 00 nop.i 0x0;; a000000100317940: 10 00 00 00 01 00 [MIB] nop.m 0x0 a000000100317946: 90 48 08 10 71 04 cmp4.eq p9,p8=r9,r2 a00000010031794c: d0 ff ff 4b (p08) br.cond.dpnt.few a000000100317910 <acpi_ev_global_lock_handler+0xf0>
applied to 2.4.26 and 2.6.5 -- please close.