Bug 202385

Summary: Inexplicable warning message about NR_CPUS limit exceeded
Product: Other Reporter: Francesco Turco (fturco)
Component: OtherAssignee: other_other
Status: NEW ---    
Severity: normal CC: rafdev
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.20.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: lscpu.txt
dmesg.txt
config

Description Francesco Turco 2019-01-22 19:11:08 UTC
I have a Gentoo Linux desktop system with an old dual-core Intel Core 2 Duo E8400 CPU and an Intel DQ35JO motherboard.

Since my CPU is dual-core, I built my kernel with the CONFIG_NR_CPUS=2 option. But after booting my system, I get a warning message in the logs:

Jan 22 19:57:33 desktop kernel: smpboot: 4 Processors exceeds NR_CPUS limit of 2

If I set CONFIG_NR_CPUS=4 instead the previous warning message disappears, but I don't understand where the number 4 comes from.

As far as I know, my CPU doesn't support hyper-threading. There is no reference to hyper-threading in the online specs and in the PDF manuals. There is also no reference to hyper-threading in the BIOS.

The BIOS is up to date (version 1143).

I don't update the Intel microcode at boot time.

Kernel version: 4.20.3

Links:
- https://ark.intel.com/products/33910/Intel-Core-2-Duo-Processor-E8400-6M-Cache-3-00-GHz-1333-MHz-FSB
- https://ark.intel.com/products/50382/Intel-Desktop-Board-DQ35JO
- https://forums.gentoo.org/viewtopic-p-8303270.html
Comment 1 Francesco Turco 2019-01-22 19:11:49 UTC
Created attachment 280673 [details]
lscpu.txt
Comment 2 Francesco Turco 2019-01-22 19:12:36 UTC
Created attachment 280675 [details]
dmesg.txt
Comment 3 Francesco Turco 2019-01-22 19:13:35 UTC
Created attachment 280677 [details]
config

/usr/src/linux/.config
Comment 4 Raffaello D. Di Napoli 2024-01-09 02:25:26 UTC
I noticed I’m getting a similar warning, on a Ryzen 5700G system (8C/16T), after I set CONFIG_NR_CPUS to 16, which should be the correct value for such system.

The warning comes from prefill_possible_map() in arch/x86/kernel/smpboot.c, and there are a few variables involved, so I added a log line to see what the function sees:

---
 _init void prefill_possible_map(void)
 {
          int i, possible;
 
+         pr_info("Raf DEBUG: setup_max_cpus=%d setup_possible_cpus=%d "
+                 "num_processors=%d disabled_cpus=%d\n", setup_max_cpus,
+                 setup_possible_cpus, num_processors, disabled_cpus);
          i = setup_max_cpus ?: 1;           
          if (setup_possible_cpus == -1) {
                  possible = num_processors;
---

This resulted in:

smpboot: Raf DEBUG: setup_max_cpus=16 setup_possible_cpus=-1 num_processors=16 disabled_cpus=16

So where are those 16 disabled CPUs coming from?! There are a few possible places, to I added more logging to all of them. This one turned out to be the right one, in acpi_register_lapic() in arch/x86/kernel/acpi/boot.c:

---
          }             
 
          if (!enabled) {
+                 pr_info("Raf DEBUG: ACPI registering disabled CPU %d %u\n",
+                         id, acpiid);
                  ++disabled_cpus;     
                  return -EINVAL;
          }
---

Which resulted in:

ACPI: Raf DEBUG: ACPI registering disabled CPU 0 16
ACPI: Raf DEBUG: ACPI registering disabled CPU 0 17
...
ACPI: Raf DEBUG: ACPI registering disabled CPU 0 30
ACPI: Raf DEBUG: ACPI registering disabled CPU 0 31

This means that my motherboard’s UEFI is providing, via ACPI, a list of 16 online + 16 ghost offline processors. In other words, a firmware bug; nothing wrong with Linux AFAICT.

As to why – I can only guess that whoever wrote that bit of UEFI code, got lazy and simply hard-coded the highest possible number of cores, rather than checking with the CPU to discover the actual number.

What to do about it? In my case at least, nothing. Looking at the code around the warning, the only thing happening is that it trims the count of CPUs (originally from ACPI) down to NR_CPUS, which means it drops all those ghost CPUs – which is perfectly fine. So it’s just dmesg noise.