Bug 207919 - Stuck at boot, dual Xeon, NX flag
Summary: Stuck at boot, dual Xeon, NX flag
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: Intel Linux
: P1 blocking
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-27 21:31 UTC by baptiste.moiroux
Modified: 2020-05-29 21:32 UTC (History)
1 user (show)

See Also:
Kernel Version: anyone starting 3.5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
cpuinfo (2.59 KB, text/plain)
2020-05-27 21:31 UTC, baptiste.moiroux
Details

Description baptiste.moiroux 2020-05-27 21:31:57 UTC
Created attachment 289369 [details]
cpuinfo

When using a kernel >= 3.5, my computer remains stuck during the boot process.
Motherboard is Supermicro X6DAL-TG, with two Intel(R) Xeon(TM) CPU 3.40GHz

This problem is only happening on SMP 64 bit versions : 32b builds are working fine, and 64b builds with the boot option nosmp are also working fine.

I did rebuild kernels until I found that the faulty commit is 8e029fcdd8702719c9179317cae9ef84ebe7027e, on branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.

In an attempt to bring more information, I have built a kernel without setting the NX flag in trampoline_64.S, and this definitely fixes the issue. Looking at my cpu flags, it appears that out of the 4 logical cores, only 2 have the nx capability.

You can find attached my /proc/cpuinfo, into which we can see that 2 of the CPUs does not have the nx flag.
Comment 1 Borislav Petkov 2020-05-28 10:03:42 UTC
You should've lead in with the fact that you have two different CPUs in there, thus the "nx" flag only in one. I'm guessing booting with "noexec=off" should fix your issue...
Comment 2 baptiste.moiroux 2020-05-28 12:32:22 UTC
To be honnest, I did not knew that my CPUs were different, and I did not suspected anything related to CPU flags until I saw the code change in trampoline_64.S

I just tried to boot with noexec=off, but it has no effect, I am still stuck at boot time.
Comment 3 Borislav Petkov 2020-05-28 13:42:22 UTC
Hmmkay. The other thing we could try is for you to swap your CPUs on the
motherboard so that CPU0 in /proc/cpuinfo *doesn't* have the "nx" flag
and then use that in order to turn it off on the remaining cores during
boot.

I think the patch to fix that should not be too ugly but you'd have to
test first whether your machine still boots with the kernels that work,
with physically swapped CPUs and if so, keep 'em that way.

The other solution I'm thinking of would be to move "noexec" and thus
cmldine parsing very early, which will become very ugly very fast and
that is not worth it.

HTH.
Comment 4 baptiste.moiroux 2020-05-29 19:22:46 UTC
Hi

Many thanks for your time spent on this issue, Borislav.
As you suggested, I did swapped the 2 CPUs, and managed to boot successfully using a prebuilt kernel (I used MX Linux 64b, on a usb stick).

However, this worked *once* only... I must have damaged something on this old guy when swapping the CPUs, as I now have huge instability, even before executing any Linux code (sometimes bios does not beep, sometimes it reboots randomly, sometimes crash while booting, ...). I had enough time to validate that the 4 CPUs were available, and that the first one was indeed the one without nx support.

Sadly, swapping back the CPUs does not help, nor running on a single one, changing graphic adapter, or changing DDR modules. I guess the CPUs are OK, and the issue is somewhere else, but for now, I can't figure out where.

As a conclusion, I think we can validate that having the CPU with the lowest spec first works, but unfortunately, I will not be able to test any patch you might provide, as my hardware will most likely RIP.

Sorry for not being able to follow up...
Comment 5 Borislav Petkov 2020-05-29 20:26:20 UTC
Ouch, sorry to hear that.

So all that sounds like mixed-steppings CPUs should not be on a single board. How did that box even come into existence? Because having two different CPUs in a single platform is simply trouble waiting to happen. And it'd be very hard to find someone who actually still does that.
Comment 6 baptiste.moiroux 2020-05-29 21:32:25 UTC
In fact, it seem that Intel said "Mixing processors of different steppings but the same model (as per CPUID instruction) is supported". In my case, the model is different; I guess changing the capabilities of a processor implies an increment of the model, and not only the stepping.
I got that computer from my company, where those machines were the standard setup for programmers until around 2007. It worked for years running various versions of Windows, until I recently wanted to install something lighter.

Note You need to log in before you can comment on or make changes to this bug.