Created attachment 289369 [details] cpuinfo When using a kernel >= 3.5, my computer remains stuck during the boot process. Motherboard is Supermicro X6DAL-TG, with two Intel(R) Xeon(TM) CPU 3.40GHz This problem is only happening on SMP 64 bit versions : 32b builds are working fine, and 64b builds with the boot option nosmp are also working fine. I did rebuild kernels until I found that the faulty commit is 8e029fcdd8702719c9179317cae9ef84ebe7027e, on branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip. In an attempt to bring more information, I have built a kernel without setting the NX flag in trampoline_64.S, and this definitely fixes the issue. Looking at my cpu flags, it appears that out of the 4 logical cores, only 2 have the nx capability. You can find attached my /proc/cpuinfo, into which we can see that 2 of the CPUs does not have the nx flag.
You should've lead in with the fact that you have two different CPUs in there, thus the "nx" flag only in one. I'm guessing booting with "noexec=off" should fix your issue...
To be honnest, I did not knew that my CPUs were different, and I did not suspected anything related to CPU flags until I saw the code change in trampoline_64.S I just tried to boot with noexec=off, but it has no effect, I am still stuck at boot time.
Hmmkay. The other thing we could try is for you to swap your CPUs on the motherboard so that CPU0 in /proc/cpuinfo *doesn't* have the "nx" flag and then use that in order to turn it off on the remaining cores during boot. I think the patch to fix that should not be too ugly but you'd have to test first whether your machine still boots with the kernels that work, with physically swapped CPUs and if so, keep 'em that way. The other solution I'm thinking of would be to move "noexec" and thus cmldine parsing very early, which will become very ugly very fast and that is not worth it. HTH.
Hi Many thanks for your time spent on this issue, Borislav. As you suggested, I did swapped the 2 CPUs, and managed to boot successfully using a prebuilt kernel (I used MX Linux 64b, on a usb stick). However, this worked *once* only... I must have damaged something on this old guy when swapping the CPUs, as I now have huge instability, even before executing any Linux code (sometimes bios does not beep, sometimes it reboots randomly, sometimes crash while booting, ...). I had enough time to validate that the 4 CPUs were available, and that the first one was indeed the one without nx support. Sadly, swapping back the CPUs does not help, nor running on a single one, changing graphic adapter, or changing DDR modules. I guess the CPUs are OK, and the issue is somewhere else, but for now, I can't figure out where. As a conclusion, I think we can validate that having the CPU with the lowest spec first works, but unfortunately, I will not be able to test any patch you might provide, as my hardware will most likely RIP. Sorry for not being able to follow up...
Ouch, sorry to hear that. So all that sounds like mixed-steppings CPUs should not be on a single board. How did that box even come into existence? Because having two different CPUs in a single platform is simply trouble waiting to happen. And it'd be very hard to find someone who actually still does that.
In fact, it seem that Intel said "Mixing processors of different steppings but the same model (as per CPUID instruction) is supported". In my case, the model is different; I guess changing the capabilities of a processor implies an increment of the model, and not only the stepping. I got that computer from my company, where those machines were the standard setup for programmers until around 2007. It worked for years running various versions of Windows, until I recently wanted to install something lighter.