I have traced/bisected this problem to this commit: _____________________________________________________________________________ 568132624386f53e87575195d868db9afb2e9316 is the first bad commit commit 568132624386f53e87575195d868db9afb2e9316 Author: H. Peter Anvin <hpa@zytor.com> Date: Tue Jul 27 17:01:49 2010 -0700 x86: Add memory modify constraints to xchg() and cmpxchg() commit 113fc5a6e8c2288619ff7e8187a6f556b7e0d372 upstream. xchg() and cmpxchg() modify their memory operands, not merely read them. For some versions of gcc the "memory" clobber has apparently dealt with the situation, but not for all. Originally-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Glauber Costa <glommer@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Peter Palfrader <peter@palfrader.org> Cc: Greg KH <gregkh@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Zachary Amsden <zamsden@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> LKML-Reference: <4C4F7277.8050306@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> :040000 040000 4bbfbd009db6441b0b30297acc5a04732b9c215e dadbcfd4cb8cb59f0ef3bef9bfc57880eb926179 M arch _____________________________________________________________________________ Reverting it on top of 2.6.35.2 fixes the problem. Please, let me know if you need any other information from me.
On Tue, Aug 17, 2010 at 2:06 AM, Artem S. Tashkinov <t.artem@mailcity.com> wrote: > > I have found a bad commit and filed a new bug 16612 report. Ok, interesting. That one should change no behavior at all, except for some very specific gcc problems. What version of gcc are you using? And would it be possible to test with another compiler version? Peter - the original report for this is mixed in with bugzilla 16588: https://bugzilla.kernel.org/show_bug.cgi?id=16588 and there's an odd page fault in memset() in the page allocator in that video. I'm not seeing anything wrong with the asm conversion, but a bisect and a revert says otherwise... Linus
On 08/17/2010 08:46 AM, Linus Torvalds wrote: > On Tue, Aug 17, 2010 at 2:06 AM, Artem S. Tashkinov > <t.artem@mailcity.com> wrote: >> >> I have found a bad commit and filed a new bug 16612 report. > > Ok, interesting. That one should change no behavior at all, except for > some very specific gcc problems. > > What version of gcc are you using? And would it be possible to test > with another compiler version? > > Peter - the original report for this is mixed in with bugzilla 16588: > > https://bugzilla.kernel.org/show_bug.cgi?id=16588 > > and there's an odd page fault in memset() in the page allocator in > that video. I'm not seeing anything wrong with the asm conversion, but > a bisect and a revert says otherwise... > OK, sounds indeed like a problem with a specific version of gcc. I'd like to get that root-caused, because unless we know what the actual constraint is we could have all kinds of similar problems without really knowing about them. The other thing that can be done, of course, is to break "+m" into "=m" ... "0" type references but I'd like to withhold judgement until I can reproduce it. The other possibility is that the __xg() hack is biting us here, in which case we should just remove it -- it's only ever was a poor hack to do what the "memory" clobber does properly, and it is entirely possible gcc has a real problem with it as an output operand. -hpa
I'm using *vanilla* version of GCC 4.4.4. My CPU is Intel Core i5 so supposedly GCC is optimizing for the core2 architecture. I'm now building the kernel using GCC 4.5.1 (vanilla), hold on.
On Tue, Aug 17, 2010 at 9:01 AM, Peter Anvin <hpa@zytor.com> wrote: > > The other thing that can be done, of course, is to > break "+m" into "=m" ... "0" type references but I'd like to withhold > judgement until I can reproduce it. Actually, no. You can't use a numbered back-reference like "0" with memory. Several (all?) versions of gcc will just blow up in your face, and allow numbered constraints only on registers. You can split a "+m" into a "=m" _and_ a regular "m", of course, but that at least historically generates really bad code (ie gcc often actually generates the address twice). Maybe gcc doesn't do that any more. But we do use "=m" in other places (atomics, bitops) etc. HOWEVER. Take a look at <asm/bitops.h> here: #if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 1) /* Technically wrong, but this avoids compilation errors on some gcc versions. */ #define BITOP_ADDR(x) "=m" (*(volatile long *) (x)) #else #define BITOP_ADDR(x) "+m" (*(volatile long *) (x)) #endif ie there we simply replace the "+m" with "=m" for gcc versions < 4.1. Maybe we should do the same thing for the xchg etc asms? Linus
On Tue, Aug 17, 2010 at 9:09 AM, Artem S. Tashkinov <t.artem@mailcity.com> wrote: > > I'm using *vanilla* version of GCC 4.4.4. Ok. In that case we already use "+m" for the bitops etc and have done that for a long time. So it's not the "old versions of gcc had trouble with +m" issue. Linus
When compiled with GCC 4.5.1 the kernel works beautifully. $ dmesg | head -n1 [ 0.000000] Linux version 2.6.35.2-ic (root@localhost.localdomain) (gcc version 4.5.1 (GCC) ) #3 SMP PREEMPT Tue Aug 17 22:10:42 YEKST 2010 $ uname -a Linux localhost.localdomain 2.6.35.2-ic #3 SMP PREEMPT Tue Aug 17 22:10:42 YEKST 2010 i686 i686 i386 GNU/Linux Probably this bug title needs to be changed in order to reflect that it's a GCC bug, not a kernel bug.
Okay, that's weird. When you say *vanilla* version, I presume you mean the 4.4.4 tarball from ftp.gnu.org... I'll see if I can reproduce it with that. That rather surprises me, I have to say, since Fedora's 4.4.4 has been my testing platform. It makes me wonder what Red Hat patches are in there and if there is something that addresses a specific bug.
I try to avoid using Fedora's GCC because they add too many patches which sometimes confuse developers both from GCC and other projects. And yes, I'm using a GCC compiled directly from gcc.gnu.org tarballs.
On Tue, Aug 17, 2010 at 9:19 AM, Artem S. Tashkinov <t.artem@mailcity.com> wrote: > > Probably this bug title needs to be changed in order to reflect that it's a > GCC > bug, not a kernel bug. Maybe. Possibly even "probably". But we've often had cases where we did something borderline, and some compiler versions didn't like it. Looking at the particular commit, though, it really should if anything be the other way around. The _old_ code was borderline, and the new code is unambiguous wrt what the compiler should do. So yes, it does sound like it is a gcc-4.4.4 issue. The fact that it doesn't happen with the Fedora version of gcc-4.4.4 strengthens that suspicion. It would be interesting to see what the difference between the Fedora version and the "plain" gnu.org compiler is. It might be the patches Fedora adds (which are quite likely real fixes: many gcc developers do work for RH), but it might also simply be things like default target options etc. Figuring out compiler bugs by comparing assembler code is usually a real pain. But it _might_ be easier if you compare compilers that are as close to each other as possible (ie the comparison between "plain gcc-4.4.4" and "Fedora 4.4.4" is likely easier than comparing 4.4.4 with 4.5.1). Linus
(In reply to comment #9) > > It would be interesting to see what the difference between the Fedora > version and the "plain" gnu.org compiler is. It might be the patches > Fedora adds (which are quite likely real fixes: many gcc developers do > work for RH), but it might also simply be things like default target > options etc. > > Figuring out compiler bugs by comparing assembler code is usually a > real pain. But it _might_ be easier if you compare compilers that are > as close to each other as possible (ie the comparison between "plain > gcc-4.4.4" and "Fedora 4.4.4" is likely easier than comparing 4.4.4 > with 4.5.1). > > Linus I could have done that myself but I have no idea an assembler output of which C files needs to be compared - cmpxchg(_32).h is included in too many places, xchg macro is used all over the place and if I'm not mistaken it's not possible to compile headers themselves.
On Tue, Aug 17, 2010 at 10:23 AM, Artem S. Tashkinov <t.artem@mailcity.com> wrote: > > I could have done that myself but I have no idea an assembler output of > which > C files needs to be compared - cmpxchg(_32).h is included in too many places, > xchg macro is used all over the place and if I'm not mistaken it's not > possible > to compile headers themselves. Compiling the headers on their own isn't even interesting. Almost all gcc bugs have historically been about subtle optimization problems that very much involve the code around the actual thing that gets miscompiled. IOW, it will depend on register pressure, previous register contents etc. But your oops does make me suspect a few things. First off, I _assume_ that your config includes CONFIG_PAE, and my guess is that your oops on memset() inside the page allocator is because some "kmap()" failed (ie the code was trying to map something in the highmem region and clear the page, but the mapping failed). An d that actually makes me suspect that it's the __set_64bit() change that triggers the problem. So what I would suggest you do to narrow this down a bit: - see _which_ part of the patch it is that triggers the problem. You're running on 32-bit, so the cmpxchg_64.h parts are uninteresting, but even the 32-bit parts can be split up into four (or even five) parts: the xchg fragment, the __set_64bit fragment, the cmpxchg part and the cmpxchg64 change (which you could split up for just the local and nonlocal cases too). So I happen to suspect (because of the oops) the __set_64bit one, but that's a pretty weak suspicion. But if you could try to "bisect" the patch itself by just trying to apply just half of it, to see _which_ part(s) make a difference, that would narrow it down a bit. - once you narrow it down to the minimal patch, narrowing it down to the particular file that fails is not that hard. Simply compile the kernel with and without that patch, and save all the object files in between. Then just see which object files change. That's still likely going to be too many to be convenient, but again the oops itself can give us a good hint. It's _likely_ related to the low-level memory mapping. Depending on the exact list, it might be possible to narrow it down to just one or two "very likely" cases. - Also, it's possible you could do some narrowing of those particular object files with different compiler versions. This is where it would be easier to use compilers that are almost the same version (ie plain gcc-4.4.4 and fedora-gcc-4.4.4), because _hopefully_ the compiler changes are so subtle that they don't change code generation for the common cases. But that's hard to guarantee. But if the previous thing has narrowed the cases down to just a handful of files, it would be interesting to compile _those_ files with the two compilers, and see if something stands out. But yeah. Finding compiler bugs is a HUGE pain. I can well understand if you decide it's too much work. Sometimes we just end up giving up and saying "ok, that compiler version has trouble". That said, even if you don't decide to try to follow up on it, it would probably be a good idea to open a gcc bugzilla entry. Maybe some gcc developer would just go "oh, known problem in gcc-4.4.4, the fix is so-and-so", and maybe he'd even have a known workaround to avoid the problem. That's a bit more likely to be the case exactly because apparently it got fixed in the Fedora version, so it probably _is_ some known bug. Linus
On Tue, Aug 17, 2010 at 10:44 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > That said, even if you don't decide to try to follow up on it, it > would probably be a good idea to open a gcc bugzilla entry. Maybe some > gcc developer would just go "oh, known problem in gcc-4.4.4, the fix > is so-and-so", and maybe he'd even have a known workaround to avoid > the problem. That's a bit more likely to be the case exactly because > apparently it got fixed in the Fedora version, so it probably _is_ > some known bug. Just emailed Jakub Jelinek. He says that the they (gcc developers) really do need a test-case that has been whittled down to a single object file. It also appears that the fedora gcc-4.4.4 isn't all that close to the original gcc-4.4.4 and there are hundreds of patches there, so nothing obvious sprung to mind to Jakub about what the problem would be. Linus
The "we need a test case" is the standard reply from gcc maintainers when something is wrong. I'm going to try to do a comparison and see if I can figure out some place where the differences are glaring, but it's possible that any real issue gets lost in the noise. set_64bit() isn't actually used in all that many places, unlike the xchg/cmpxchg ones, since it's a x86-specific API.
Created attachment 27493 [details] My 2.6.35.2 .config-uration Finding an object that's being miscompiled seems to be a challenging task, so I wonder if maybe some kernel debug options could help spot it.
Could you try 2.6.36-rc1?
I'm now running 2.6.36-rc1 beautifully. dmesg | head -n1 [ 0.000000] Linux version 2.6.36-rc1-ic (root@localhost.localdomain) (gcc version 4.4.4 (GCC) ) #1 SMP PREEMPT Wed Aug 18 11:10:47 YEKST 2010 And BTW 2.6.35.2 sometimes crashes as early as intel-agp module is being loaded.
Okay, so this is specific to the backport. Certainly explains why I didn't reproduce it. More tomorrow...
My guess is that this has something to do with locking around page table manipulation. I still don't have a setup to actually reproduce this problem, and looking at the object code directly doesn't help -- there is too much noise in the form of register changes and instruction movement to be able to draw any conclusive instructions. Artem, would you be willing to try 2.6.35.2, reverting *only* the __set_64bit part of the patch in question?
(In reply to comment #18) > My guess is that this has something to do with locking around page table > manipulation. I still don't have a setup to actually reproduce this problem, > and looking at the object code directly doesn't help -- there is too much > noise > in the form of register changes and instruction movement to be able to draw > any > conclusive instructions. > > Artem, would you be willing to try 2.6.35.2, reverting *only* the __set_64bit > part of the patch in question? Reverting this part of the patch has seemingly fixed the problem (I tried modprobing i915 module and it worked, I haven't tried to boot to the desktop).
On Wed, Aug 18, 2010 at 7:47 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote: > > Reverting this part of the patch has seemingly fixed the problem (I tried > modprobing i915 module and it worked, I haven't tried to boot to the > desktop). Hmm. Instead of reverting it, maybe we should just take that cleanup/simplify that we have in mainline (commit 69309a059075: "x86, asm: Clean up and simplify set_64bit()"), since apparently 2.6.36-rc1 works for Artem, and that way we get _closer_ to the mainline code rather than further away. And it really is a simplification and cleanup. Linus
Created attachment 27541 [details] Test patch - against 2.6.35.2 Could you please try this against unmodified 2.6.35.2? I have a suspicion that this will at least improve the situation.
Created attachment 27701 [details] v2.6.35.2 w/ and w/o the "-m" output
Managed to reproduce the problem (instant reboots/lockups in qemu), the guilty change seems to be: @@ -70,14 +70,14 @@ static inline void __set_64bit(unsigned long long *ptr, - : /* no outputs */ + : "=m" (*ptr) Reverting just this line, ie diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h index c1cf59d..15e8578 100644 --- a/arch/x86/include/asm/cmpxchg_32.h +++ b/arch/x86/include/asm/cmpxchg_32.h @@ -70,11 +70,11 @@ static inline void __set_64bit(unsigned long long *ptr, unsigned int low, unsigned int high) { asm volatile("\n1:\t" - "movl (%1), %%eax\n\t" - "movl 4(%1), %%edx\n\t" - LOCK_PREFIX "cmpxchg8b (%1)\n\t" + "movl (%0), %%eax\n\t" + "movl 4(%0), %%edx\n\t" + LOCK_PREFIX "cmpxchg8b (%0)\n\t" "jnz 1b" - : "=m" (*ptr) + : : "D" (ptr), "b" (low), "c" (high) results in a booting kernel (adding the magic volatile dust works too). The difference looks like in the diff (attached above). Didn't investigate further. FWIW, using "+m" instead of "=m" changes, iirc, zap_low_mappings() a bit, but does not result in a working kernel. The obvious ways of fixing this primitive unfortunately do not work that well -- the compiler generates even worse code than what's in that diff; larger stack frames and more dummy unused references. Something like the patch below expresses what's going on much better, actually works and even minimally improves the generated code instead of regressing (smaller stackframes and a few less moves). Only tested w/ gcc4.4 so far (the version relevant to this bug), hence unsuited for stable; a simple revert (as above) looks much safer. Note: there is a reason why there is no "A" constraint in here. Ditto for no dummy outputs and tmp vars (instead of low/high reuse). Otherwise the resulting code becomes even worse than w/ that "=m" addition, stack frames grow significantly etc. diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h --- a/arch/x86/include/asm/cmpxchg_32.h +++ b/arch/x86/include/asm/cmpxchg_32.h @@ -70,15 +70,15 @@ static inline void __set_64bit(unsigned long long *ptr, unsigned int low, unsigned int high) { asm volatile("\n1:\t" - "movl (%1), %%eax\n\t" - "movl 4(%1), %%edx\n\t" - LOCK_PREFIX "cmpxchg8b (%1)\n\t" + LOCK_PREFIX "cmpxchg8b %0\n\t" "jnz 1b" - : "=m" (*ptr) - : "D" (ptr), - "b" (low), - "c" (high) - : "ax", "dx", "memory"); + : "+m" (*ptr), + "=a" (low), "=d" (high) + : "b" (low), + "c" (high), + "1" ((unsigned int)*ptr), + "2" ((unsigned int)(*ptr>>32)) + ); } static inline void __set_64bit_constant(unsigned long long *ptr,
What about trying the one-liner "volatile" patch that I asked you to try out? We're already fairly confident that the following two patches from mainline (which do roughly what you're proposing, but cleaner) solves the problem: 69309a05907546fb686b251d4ab041c26afe1e1d 4532b305e8f0c238dd73048068ff8a6dd1380291 ... but we were looking to see if a smaller patch could track down the gcc issue.
Actually, 69309a05907546fb686b251d4ab041c26afe1e1d by itself should solve the problem. Could you try that perhaps?
(In reply to comment #24) > What about trying the one-liner "volatile" patch that I asked you to try out? That wasn't me... But, as i already said above, the magic volatile dust (the oneliner) happened to work for my artificial testcase too (booting a kernel w/ the .config from this bug in qemu). > We're already fairly confident that the following two patches from mainline > (which do roughly what you're proposing, but cleaner) solves the problem: For some definition of cleaner. There was also a reason for that final 'note'... ;) I didn't look too closely; but, iirc, there was a "+A" in there and a few tmp vars -- both things didn't work well w/ the gcc4.4 i was testing with. Anyway, looking at that is on my todo list... > ... but we were looking to see if a smaller patch could track down the gcc > issue. I attached the full asm diff of a broken vs working kernel in case somebody wants to track down the exact cause; i can easily generate the unmunged versions of the five affected functions, if that will help. Didn't see anything obviously wrong, the bogus "=m" constraint (which is unused, but makes gcc generate another reference) adds a lot of noise and code movement. Considering that this affects (and was introduced in) stable, i'd prefer either reverting to a known-working state, or fixing it properly, instead of hoping that some magic volatiles or constraints just happen to hide the problem again. So i didn't look closely at that x86 code, and just fixed the asm()...
Thanks. bugzilla-daemon@bugzilla.kernel.org wrote: >https://bugzilla.kernel.org/show_bug.cgi?id=16612 > > > > > >--- Comment #26 from Artur Skawina <art.08.09@gmail.com> 2010-08-23 02:59:48 >--- >(In reply to comment #24) >> What about trying the one-liner "volatile" patch that I asked you to try >> out? > >That wasn't me... >But, as i already said above, the magic volatile dust (the oneliner) happened >to work for my artificial testcase too (booting a kernel w/ the .config from >this bug in qemu). > >> We're already fairly confident that the following two patches from mainline >> (which do roughly what you're proposing, but cleaner) solves the problem: > >For some definition of cleaner. There was also a reason for that final >'note'... ;) >I didn't look too closely; but, iirc, there was a "+A" in there and a few tmp >vars -- both things didn't work well w/ the gcc4.4 i was testing with. >Anyway, looking at that is on my todo list... > >> ... but we were looking to see if a smaller patch could track down the gcc >> issue. > >I attached the full asm diff of a broken vs working kernel in case somebody >wants to track down the exact cause; i can easily generate the unmunged >versions of the five affected functions, if that will help. Didn't see >anything >obviously wrong, the bogus "=m" constraint (which is unused, but makes gcc >generate another reference) adds a lot of noise and code movement. Considering >that this affects (and was introduced in) stable, i'd prefer either reverting >to a known-working state, or fixing it properly, instead of hoping that some >magic volatiles or constraints just happen to hide the problem again. So i >didn't look closely at that x86 code, and just fixed the asm()... > >-- >Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email >------- You are receiving this mail because: ------- >You are on the CC list for the bug. >You are watching the assignee of the bug.
"We need a testcase" is a standard reply in GCC bugreports, sure. Most probably just a pair of preprocessed sources (one with the old, one with new, set_64bit definition, together with gcc options used to compile it ought to be enough, at least for the beginning. As some progress have been made apparently in narrowing this down, can someone please update http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45312 with the requested information? From the above it seems it is init_32.i that needs to be attached. OT, if you are looking for set_64bit implementation that would be best scheduled by gcc, it would be static inline void set_64bit(u64 *ptr, u64 value) { __sync_lock_test_and_set(ptr, value); } The file needs to be compiled with -march=i586 or higher (not sure if it is common to compile kernel with say -march=i386, but still make it usable only on i586+, as cmpxchg8b insn is i586+), and only GCC 4.2+ support it on 64-bit types in 32-bit code (plus 4.1-RH).
I don't see where you get init_32.i from, but perhaps you're seeing something I'm not. Either way, __sync_lock_test_and_set() isn't really going to work for us, because of the gcc4.2+ report, and because of not-yet-merged patches which uses the kernel's alternatives mechanism to provide this API on legacy hardware. Either way, this *seems* to be a list of functions (significantly) affected by this change, at least in my particular build: zap_low_mappings() arch/x86/mm/init_32.o __change_page_attr_set_clr() arch/x86/mm/pageattr.o set_pmd_pfn() arch/x86/mm/pgtable_32.o __pte_alloc_kernel() mm/memory.o free_pgd_range() mm/memory.o -hpa
From #c22 I thought it is init_32.c. Anyway, could Artem check which one from the above list causes his system to be unbootable and attach the preprocessed source + cmdline options? Build with make V=1, add -save-temps to gcc options used to compile the object in question, attach *.i and mention the options?
FWIW, arch/x86/mm/pageattr.c:__change_page_attr_set_clr() seems to be the one causing problems here. The massive inlining in there doesn't help in spotting the problem, but this magic patch reliably makes the 2.5.35.2 continue past "Freeing unused kernel memory"... diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -805,6 +805,7 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias) ret = __change_page_attr(cpa, checkalias); if (!debug_pagealloc) spin_unlock(&cpa_lock); + printk(""); if (ret) return ret; The minimal diff between a working kernel and the crashing one is, for this function, the "=m" addition, as the previous comment. The munged diff (attached previously) didn't provide the smoking gun, iirc, the only real difference in that area was a different stack slot used. Couldn't make myself trace all of the ~800 lines of x86 code that this routine expands to...
arch/x86/mm/pageattr.c:__change_page_attr_set_clr() seems to be the one causing problems here. The massive inlining in there doesn't help in spotting the problem, but this magic patch reliably makes the 2.5.35.2 continue past "Freeing unused kernel memory"... diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -805,6 +805,7 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias) ret = __change_page_attr(cpa, checkalias); if (!debug_pagealloc) spin_unlock(&cpa_lock); + printk(""); if (ret) return ret; The minimal diff between a working kernel and the crashing one is, for this function, the "=m" addition, as the previous comment. The munged diff (attached previously) didn't provide the smoking gun, iirc, the only real difference in that area was a different stack slot used. Couldn't make myself trace all of the ~800 lines of x86 code that this routine expands to...
So, looking at the assembly difference (thanks to Arthur for mailining me the preprocessed sources and cmdline options), even with current trunk I see two differences between good and bad. 1) the -100(%ebp) and -104(%ebp) stack slots got swapped, this is not a bug, just an annoyance in diffing, so I've done sed -i -e 's/-100(/-10X(/;s/-104(/-100(/;s/-10X(/-104(/' pageattr-bad.s 2) the remaining diff (bad to good) is: .L74: movl -108(%ebp), %eax xorl %edx, %edx subl mem_map, %eax movl __supported_pte_mask, %ecx sarl $5, %eax andl $99, %ecx shldl $12, %eax, %edx - movl -104(%ebp), %edi + movl %ecx, %esi sall $12, %eax - movl %edx, %ebx - orl %eax, %ecx movl %edx, -28(%ebp) - movl %ecx, -32(%ebp) + orl %eax, %esi movl %edx, %ecx + movl %esi, -32(%ebp) + movl -104(%ebp), %edi + movl %esi, %ebx #APP # 72 "/mnt/b1/src/linux/set64/arch/x86/include/asm/cmpxchg_32.h" 1 1: movl (%edi), %eax movl 4(%edi), %edx .section .smp_locks,"a" .balign 4 .long 671f - . .previous 671: lock; cmpxchg8b (%edi) jnz 1b at the second cmpxchg8b. While -32(%ebp) long long slot is initialized correctly, in the bad case ebx will be the same as ecx (i.e. value >> 32 where value is (long long unsigned int) (long unsigned int) (((int) base - (int) mem_map) /[ex] 32) << 12 | __supported_pte_mask & 99), while in the good case ebx contains properly the low bits of value instead.
s/trunk/head of 4.4 branch/, sorry for confusion.
OK, so I think we can close this as a Linux bug. Please continue the discussion in the gcc bugzilla, obviously.
(In reply to comment #35) > OK, so I think we can close this as a Linux bug. Please continue the > discussion in the gcc bugzilla, obviously. I'm sorry for not answering earlier as I was on leave - is there anything else left for me to test or try? And from the conversation, I couldn't quite understand - is it the kernel or GCC's bug? Should I avoid using vanilla GCC 4.4.4 in the future?
This bug is fixed in yet to be released GCC 4.5.2 (yes, GCC 4.5.1 is also buggy, but somehow the problem wasn't triggered), 4.4.5 and 4.3.6. It's already fixed in SVN.