Created attachment 254551 [details]
Hello. The exact problem is in the beginning of the dmesg attached to this bug. I have noticed this problem right about when kernel changed from 4.8 series to 4.9 series. Right before that, it worked with 4 cores * 2 (for hyperthreading). Right now, /proc/cpuinfo shows 1 core. Before it showed 8.
If noacpi is passed at boot, system boots with 4 cores (and seems hyperthreading off).
Will attach config, acpidump and lspci dump shortly. Let me know if I can provide further information.
Created attachment 254561 [details]
Created attachment 254571 [details]
Created attachment 254581 [details]
hello. a little follow-up to this bug report. I've managed to try the new 4.9.10 kernel. as well as 4.4.49. I will be posting the relevant files. sorry for missing lspci for 4.9.10. had only 2-3 minutes between boots. the server is in a remote location. i rarely have the opportunity to catch someone over the phone to do reboots for me.
so, as an update, with kernel 4.9.10 it started with 2 cpus. 0 and 6 for some reason. i think. with 4.4.49 it started with normal 8. both kernels (and the one before - 4.9.9 and 4.9.8 - the one that generated the logs in the original bug report) were compiled with basically the same configs. that didn't change.
given the fact that it's the same machine, same config, just different versions of kernel that yield different results, I'm going to assume it's not me, or gentoo, but the kernel itself, this one time.
so logs incoming, in the hope that machines like that poor dl380 g7 will be taken care of and get the love they deserve. and again, if i can provide further information...
Created attachment 254797 [details]
Created attachment 254799 [details]
Created attachment 254801 [details]
Created attachment 254803 [details]
Hi Alexandru, according to dmesg 4.4.49, there is only one CPU brought up?
And per dmesg 4.9,
[ 0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:2065
it seems that Linux got wrong apicid from madt table.
But this issue happened after update from 4.8 to 4.9? Do you have time to take a git bisect for it?
Hello. It is only ONE physical cpu, 4 cores * 2 hyperthreading. 8 logical. Total.
According to 4.4.x series all are brought up. According to 4.9 and now 4.10 series only one logical core is up. Although kernel knows there are 8, just can't handle the order it seems.
With 4.8 series again, all worked for a while (in the start of the series). Sometime between 4.8 and 4.9 it stopped working.
It is not my machine, and I do not have remote access. Next time I am scheduled to go there and update will be sometime in april. I plan to try: 4.10 last of them, 4.4 last of them and if none of them work I plan to try all kernels since 4.8 until current day to find out at which release the bug appeared (which i assume is git bisecting). But I don't know if the owner will allow me so much time for testing. I can compile the kernels at home and go with them already prepared, but that damn machine take 5 minutes to boot. So 10 kernels equals with 1 hour of downtime. I'll do my best to try as many of them.
In order to ease your load, currently let's debug on it w/o recompiling the kernel firstly to see if there is any clue, please boot with both 4.9 and 4.4 (bad and good version), and test with:
1. append the following command into grub:
"ftrace=function_graph ftrace_graph_filter=_cpu_up trace_event=cpuhp:cpuhp_exit,cpuhp:cpuhp_multi_enter,cpuhp:cpuhp_enter"
2. boot up the system, and then provide:
cat /sys/kernel/debug/tracing/trace > ~/cpuonline.log
BTW, please also test with minimal drivers load by appending "init=/bin/bash" in the grub.
Hello. I'll do my best to run these parameters, and return the log. I expect in the following week or the next. I'll prolly try 4.10.8 and 4.4.59. or whatever is latest at that moment.
Created attachment 255821 [details]
Created attachment 255823 [details]
Created attachment 255825 [details]
Created attachment 255827 [details]
Created attachment 255829 [details]
Created attachment 255831 [details]
HP ProLiant DL360 G6, BIOS P64 08/16/2015
1 CPU X5550
with kernel 4.7.7 it started with 4 cpu cores
with kernel 4.10.8 it started with 1 cpu cores
HP ProLiant DL360 G7, BIOS P68 08/16/2015
2 CPU X5650
with kernel 4.7.7 it started with 24 cpu cores
with kernel 4.10.6 it started with 20 cpu cores
I noticed a suspicious point that there is a bug might be related to this issue, which was introduced in 4.9 and was fixed in 4.11-rc3
the bug was introduced in
Author: Gu Zheng <firstname.lastname@example.org>
Date: Thu Aug 25 16:35:15 2016 +0800
and was fixed in
Author: Dou Liyang <email@example.com>
Date: Fri Mar 3 16:02:25 2017 +0800
Please test the latest upstream kernel(NOT the distribution one):
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
and compile this one.
I'm closing this one as no respond for a while. Please feel free to reopen if Comment 21 doesn't work for you.