Created attachment 283321 [details] Revert commit 459e3a21 System comes up to the login prompt without changing from the initial terminal font to a smaller one as it usually does. When I supply user name and password, I just get the login prompt again. Ctl-Alt-Del to reboot gets message from INIT "No more processes at this run level". Reset button required. On rebooting good kernel, there is nothing in the log from the failing session. git bisect identifies commit 459e3a21535ae3c7a9a123650e54f5c882b8fcbf as the culprit. Reverting this commit in 5.1.12 gave a good working kernel again. The supplied patch is simply a revert. This will re-introduce the gcc-9 warnings that 459e3a21 was designed to eliminate. Sorry I have not attempted to address these warnings.
Hmm, this patch shouldn't change anything AFAICT. Adding some people to CC.
Whee. That's truly funky. That commit should make absoluely zero difference to code generation. It literally just changes the declaration of a variable from a single char to an array. And yes, we then take the address of that variable, but the address of an array should be the exact same thing as the address of the variable. So the _only_ effect of that change should be that the compiler now sees the proper size of the area, and thus avoids the warning that was brought in by gcc-9. I'd expect the bisection to be simply wrong, but the fact that the problem also goes away with a revert implies that it's a real issue. Duncan, could you send your config, and a boot log from a successful boot with the revert applied? I'll try to think about how this could *possibly* change any code generation at all..
Oh, and could you do make arch/x86/entry/vdso/vclock_gettime.o make arch/x86/entry/vdso/vdso32/vclock_gettime.o both with and without that revert (but otherwise identical kernel configs), and send me the four different object files? Also, is your user space 32-bit or 64-bit? And is it some old user space? That might explain why you see problems when nobody else does, although I still don't see how that change should affect anything. The address of an array should be the same as the address of the non-address variable, but admittedly we do some very odd things with the vdso's, and maybe there is some use that I'm not seeing. More likely I'm blind, and that commit is fundamentally broken for some obvious reason, and I'm just missing something really really obvious. And the only reason I don't see it is that I have modern user space and don't use the vdso's at all.
Oh, and *because* the vdso files are so magic from a build perspective, you shouldn't actually do that make arch/x86/entry/vdso/vdso32/vclock_gettime.o that I suggested. You should just do a full kernel build with and without the revert, and then those files should just be there. Maybe add all those arch/x86/entry/vdso/vdso*.so.dbg files to this bugzilla entry too.. Luto, what it is you did to debug that nasty nasty vdso build end result? I think you're the last person to have touched all that.
I'm actually testing this with my own compiler, and I see build differences myself. I still can't tell the *reason* for them, some of them seem to be just random changes just because gcc picked different registers, and I see how gcc had a different model for internal temporary numbers. So the change from 'char' to 'char []' definitely changes what gcc does. As far as I can tell, the changes are all benign, and in fact seem to look like improvements, ie: lea hvclock_page,%r12 mov hvclock_page,%edx lea 0x1d(%r12),%r8 mov %edx,%ebx and $0xfffffffe,%ebx testb $0x1,(%r8) je 444 <do_hres+0x44> becomes mov hvclock_page,%edx lea hvclock_page,%rbx and $0xfffffffe,%edx mov %edx,%r8d testb $0x1,hvclock_page+0x1d je 444 <do_hres+0x44> which seems simpler and doesn't have that unnecessary %r8 address generation. But the fact that register allocations change makes it really hard to see that all the rest really is the same code. There might be something I am missing. But this may well be compiler-specific, and clearly gcc does end up having different logic for the address of a single byte vs for the address of an array of bytes, and I'm either missing the meaningful difference, or the different compiler version you have just means that for you the changes aren't benign at all.
(In reply to Linus Torvalds from comment #4) > Luto, what it is you did to debug that nasty nasty vdso build end result? I > think you're the last person to have touched all that. A combination of objtool, QEMU+gdb, and swearing. And some tools I wrote to dump the vDSO image out of a running kernel. And make SUBDIRS=. BTW: $ make -j4 SUBDIRS=arch/x86/entry/vdso V=1 Makefile:213: ================= WARNING ================ Makefile:214: 'SUBDIRS' will be removed after Linux 5.3 Makefile:215: Please use 'M=' or 'KBUILD_EXTMOD' instead Makefile:216: ========================================== Can we please not remove SUBDIRS, or at least make sure we have a credible replacement? M= and KBUILD_EXTMOD don't do this at all.
(In reply to Andy Lutomirski from comment #6) > > $ make -j4 SUBDIRS=arch/x86/entry/vdso V=1 > Makefile:213: ================= WARNING ================ > Makefile:214: 'SUBDIRS' will be removed after Linux 5.3 > Makefile:215: Please use 'M=' or 'KBUILD_EXTMOD' instead > Makefile:216: ========================================== > > Can we please not remove SUBDIRS, or at least make sure we have a credible > replacement? M= and KBUILD_EXTMOD don't do this at all. What does SUBDIRS do for you that you don't get from just doing make -j4 V=1 arch/x86/entry/vdso/ Hmm?
Created attachment 283329 [details] Cleanup patch Does this cleanup patch make any difference to the behavior you see? It's what I should have done originally instead of 459e3a21535a ("gcc-9: properly declare the {pv,hv}clock_page storage") but I tried to keep the warning fixup patch _minimal_ instead of aiming for doing cleanups. It makes the casts pointless, because it just declares the magic tsc pages as having the right type to begin with. Honestly, I still don't see why any of this should matter, since the compiler should do the same thing regardless, but at least this cleans code up. I wonder if it makes gcc generate different object code?
Created attachment 283343 [details] .config with patch reverted (CONFIG_LOCALVERSION changed)
Created attachment 283345 [details] dmesg output on booting with patch reverted
Created attachment 283347 [details] log of boot with patch reverted (debug.log, logs everything)
(In reply to Linus Torvalds from comment #7) > (In reply to Andy Lutomirski from comment #6) > > > > $ make -j4 SUBDIRS=arch/x86/entry/vdso V=1 > > Makefile:213: ================= WARNING ================ > > Makefile:214: 'SUBDIRS' will be removed after Linux 5.3 > > Makefile:215: Please use 'M=' or 'KBUILD_EXTMOD' instead > > Makefile:216: ========================================== > > > > Can we please not remove SUBDIRS, or at least make sure we have a credible > > replacement? M= and KBUILD_EXTMOD don't do this at all. > > What does SUBDIRS do for you that you don't get from just doing > > make -j4 V=1 arch/x86/entry/vdso/ > > Hmm? I didn't know that worked. I tried it at some point without the trailing slash and nothing happened. Thanks!
(In reply to Linus Torvalds from comment #8) > Created attachment 283329 [details] > Cleanup patch > > Does this cleanup patch make any difference to the behavior you see? > > It's what I should have done originally instead of 459e3a21535a ("gcc-9: > properly declare the {pv,hv}clock_page storage") but I tried to keep the > warning fixup patch _minimal_ instead of aiming for doing cleanups. > > It makes the casts pointless, because it just declares the magic tsc pages > as having the right type to begin with. > > Honestly, I still don't see why any of this should matter, since the > compiler should do the same thing regardless, but at least this cleans code > up. I wonder if it makes gcc generate different object code? Linus, I like your patch except for the const. I agree that those variables are conceptually const, but I think that the declaration: extern const foo; means that gcc will assume that it matches some other definition: extern const foo = { ... }; and that it therefore will not change out from under the function being compiled. This could easily result in completely incorrect code generation. Also, from Duncan's logs, it looks like he's running on bare metal. Duncan, is that right? If so, what the heck is going on here? The code in question shouldn't be running at all. The logs do have this goody: [ 0.433315] tsc: Marking TSC unstable due to TSCs unsynchronized So we should be running with vclock_mode == VCLOCK_NONE, which is 0. Looking at the asm from my build with and without the patch reverted, it appears identical and correct. So I'm mystified. Duncan, can you send us the arch/x86/entry/vdso/vclock_gettime.o from a non-working kernel. Also, is this 32-bit userspace or 64-bit userspace?
I regret to have to advise that attachment 283343 [details] fails the same way as before. I'll gather together object files as requested - need to rebuild to get the working ones (d'oh) In reply to previous posts: > Also, is your user space 32-bit or 64-bit? And is it some old user space? As you'll see from dmesg / log, 64-bit. As to age: Slackware 14.2 patched up to 07-Feb-2019. So, gcc 5.5.0 - perhaps about the oldest gcc that will build modern kernels. I do have a VM running Slackware Current (with gcc-9) - would it be any help to try a build on that? > That might explain why you see problems when nobody else does Perhaps not quite nobody else - see https://www.linuxquestions.org/questions/slackware-14/new-kernel-5-1-5-1-2-boot-failure-strangeness-4175653980/ Looking again at that thread, @petejc in post #5 built under Slackware Current (might have been gcc-8) and that kernel worked on his old system. But I can re-check, if you would like.
> Also, from Duncan's logs, it looks like he's running on bare metal. Yes indeed. > Also, is this 32-bit userspace or 64-bit userspace? 64-bit (was composing my previous post when you asked)
Created attachment 283349 [details] vclock_gettime.o with new patch
(In reply to Duncan Roe from comment #14) > I regret to have to advise that attachment 283343 [details] fails the same > way as before. > I'll gather together object files as requested - need to rebuild to get the > working ones (d'oh) > In reply to previous posts: > > Also, is your user space 32-bit or 64-bit? And is it some old user space? > As you'll see from dmesg / log, 64-bit. > As to age: Slackware 14.2 patched up to 07-Feb-2019. So, gcc 5.5.0 - perhaps > about the oldest gcc that will build modern kernels. So there could easily be some weird GCC bug here. I'd be interested in seeing both a working (with the revert) and broken vclock_gettime.o from this GCC. > I do have a VM running Slackware Current (with gcc-9) - would it be any help > to try a build on that? It could be educational.
I'm not sure what your vclock_gettime.o corresponds to (what does "new patch" mean?), but it disassembles like this: 0000000000000000 <do_hres>: 0: 55 push %rbp 1: 44 0f be 1d 00 00 00 movsbl 0x0(%rip),%r11d # 9 <do_hres+0x9> 8: 00 5: R_X86_64_PC32 pvclock_page+0x18 and there's the problem! Accessing pvclock_page if we're not using the pvclock is not okay. It looks like we either have some header screwup or gcc 5.5.0 is buggy, since gtod_read_begin() contains an smp_rmb(), and that should be enough to prevent the hoisting that gcc is doing. That being said, it would be perfectly legal, if rather silly, for GCC to hoist the loads before the vclock_mode checks but after gtod_read_begin(). I'll attach a patch to try to fix this. Linus, any insight as to why gcc appears to be miscompiling the whole function? I'm wondering if there's some gcc bug that causes it to think that const pointers point to unchanging data even if there's a barrier. Hmm, barrier is just a "memory" clobber -- maybe gcc thinks that clobbering memory doesn't clobber things that are only accessible by const pointer. Duncan, can you try Linus' patch minus the const?
Created attachment 283351 [details] patch to add potentially missing barrier
> (what does "new patch" mean?) attachment 283329 [details]
I tried to provoke this miscompilation using godbolt with old GCCs and I failed. Here's how far I got: struct foo { int val; }; extern unsigned char x[4096] __attribute__((visibility("hidden"))); static const struct foo *get_ptr(void) { return (const struct foo *)&x; } int func(void) { int i; const int *ptr = &get_ptr()->val; int ret = 0; for (i = 0; i < 10; i++) { asm volatile ("" ::: "memory"); ret += *ptr; } return ret; } If I make that "extern const unsigned char x[4096]", then the barrier no longer takes effect, even on modern gcc, so Linus' patch is definitely wrong. But I can't get the miscompilation without that const. I feel like I'm still missing some subtlely.
(In reply to Duncan Roe from comment #20) > > (what does "new patch" mean?) > attachment 283329 [details] Ah, got it. So it's exactly the issue in comment 21 if Linus' patch is applied. Can you post vclock_gettime.o from an unpatched kernel?
Will try attachment 283329 [details] first, OK?
Created attachment 283353 [details] All requested .o & dbg files 3 directories with subdirs for vdso32. "original" is from laptop build area - same GCC but is 5.1.11 (others are 5.1.12)
(In reply to Duncan Roe from comment #24) "original" means "unpatched" "new patch" means attachment 283329 [details] as before
(In reply to Duncan Roe from comment #23) > Will try attachment 283329 [details] first, OK? I meant attachment 283351 [details] (barrier.patch). I just apply that to otherwise unpatched kernel, right?
(In reply to Andy Lutomirski from comment #6) > And some tools I wrote to dump the vDSO image out of a running kernel. This doesn't come up for the first time. Maybe we should have an vdso dumper in tools/ ...
(In reply to Andy Lutomirski from comment #13) > [ 0.433315] tsc: Marking TSC unstable due to TSCs unsynchronized He has an AMD K8 which has unstable TSC.
Results of 2 builds: With barrier.patch, built with GCC-5.5.0 - fails as before Unpatched, built with GCC-9.1.0 - works fine
2f: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 35 <do_hres+0x35> 31: R_X86_64_PC32 vvar_vsyscall_gtod_data 35: 83 f8 01 cmp $0x1,%eax 38: 74 5b je 95 <do_hres+0x95> We get here if TSC is unusable, i.e. only on your old laptop. 3a: 83 f8 02 cmp $0x2,%eax Are we using pvclock? 3d: 44 8b 15 00 00 00 00 mov 0x0(%rip),%r10d # 44 <do_hres+0x44 > 40: R_X86_64_PC32 pvclock_page-0x4 Muahaha, segfault! Thanks, GCC. 44: 0f 84 c8 00 00 00 je 112 <do_hres+0x112> My barrier patch wasn't quite good enough. I'll attach a better one.
Created attachment 283373 [details] Proposed fix
(In reply to Andy Lutomirski from comment #31) > Created attachment 283373 [details] > Proposed fix That looks like the obviously right thing to do. Duncan, if you just verify that that patch (on top of original sources - no revert or anything else) makes things work for you - and I obviously think it will - then I'll apply it. Andy, I'd want the proper sign-off and commit message, of course. Thanks everybody.
(In reply to Linus Torvalds from comment #32) > Andy, I'd want the proper sign-off and commit message, of course. You just have to click harder :) Apparently clicking "Proposed fix" or "details" will give it to you. If you click "View" or "Details" from the diff viewer, you'll also get it, but if you click "Raw Unified", you get a very stripped down diff.
P.S. Linus, I'd like to pick up and fix your cleanup, too. Can I have a Signed-off-by?
(In reply to Andy Lutomirski from comment #34) > P.S. Linus, I'd like to pick up and fix your cleanup, too. Can I have a > Signed-off-by? Sure. You can add my sign-off. I'm wondering if that patch should be changed to bit just not have the "const", but indeed mark the whole struct "volatile" instead? I think that would require the argument types to be fixed up too, but I think it would make the barrier unnecessary. What do you think?
(In reply to Linus Torvalds from comment #35) > (In reply to Andy Lutomirski from comment #34) > > P.S. Linus, I'd like to pick up and fix your cleanup, too. Can I have a > > Signed-off-by? > > Sure. You can add my sign-off. > > I'm wondering if that patch should be changed to bit just not have the > "const", but indeed mark the whole struct "volatile" instead? > > I think that would require the argument types to be fixed up too, but I > think it would make the barrier unnecessary. > I think it would indeed make the barrier unnecessary, but I'd be concerned that the resulting code will be horrible. For example, __pvclock_read_cycles does: static __always_inline u64 __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src, u64 tsc) { u64 delta = tsc - src->tsc_timestamp; u64 offset = pvclock_scale_delta(delta, src->tsc_to_system_mul, src->tsc_shift); return src->system_time + offset; } and pvclock_scale_delta has this: __asm__ ( "mulq %[mul_frac] ; shrd $32, %[hi], %[lo]" : [lo]"=a"(product), [hi]"=d"(tmp) : "0"(delta), [mul_frac]"rm"((u64)mul_frac)); With volatile, GCC has to emit the loads in exactly the order in which they're sequenced. I give it a quick try, and the assembly looks worse in ways that don't even look like they're required by the standard. This: 11c: f6 05 00 00 00 00 01 testb $0x1,0x0(%rip) # 123 <do_hres+0x123> 11e: R_X86_64_PC32 pvclock_page+0x18 becomes: 121: 0f b6 05 00 00 00 00 movzbl 0x0(%rip),%eax # 128 <do_hres+0x128> 124: R_X86_64_PC32 pvclock_page+0x19 128: a8 01 test $0x1,%al and there are quite a few examples of transformations like that in the diff. I suppose we could try to play awful games where we make the pages unconditionally readable, but that gets a bit gross, since it's not at all clear what's supposed to be mapped there if there is no pvclock. I looked briefly at using static_cpu_has(X86_FEATURE_HYPERVISOR), but I'm not convinced it'll help anything performance or correctness-wise.
Fair enough. Gcc does indeed make a mess of volatile accesses and won't even do the obvious instruction simplification. So yeah, take my patch, remove the const, and it is at least a small cleanup. Thanks
Ok, I have applied Luto's patch, but apparently I can't close the bug since I'm not the owner of it. Duncan, please close the bug once you've tested the patch. And if it turns out that Andy's barrier patch doesn't fix it (but I really am pretty sure it does), please holler and we'll continue to look at it. Luto - I'll leave the cleanup patch to you. I don't think it's necessarily worth it at this stage, and I think it would be better to do as part of the 5.3 merge window or something. But it's obvious enough that if you *want* to make it part of -tip earlier, I won't object. Of course, the patch that ended up exposing this gcc instruction scheduling issue was "obvious" too.
[ Irrelevant side comment ] (In reply to Andy Lutomirski from comment #33) > (In reply to Linus Torvalds from comment #32) > > Andy, I'd want the proper sign-off and commit message, of course. > > You just have to click harder :) Yeah, the bugzilla patch view seems odd. If I ask to view the diff, I only see the diff. And if I ask to view the attachment, I just get "attachment is not viewable in your browser". So I never even noticed that you had a good commit message and sign-off, because the only thing I see in the browser is the diff. Downloading it worked fine. I think there's a good reason we usually do things in email.
Fixed by attachment 283373 [details]