Bug 202385 - Inexplicable warning message about NR_CPUS limit exceeded
Summary: Inexplicable warning message about NR_CPUS limit exceeded
Status: NEW
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-22 19:11 UTC by Francesco Turco
Modified: 2024-01-09 02:25 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.20.3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lscpu.txt (1000 bytes, text/plain)
2019-01-22 19:11 UTC, Francesco Turco
Details
dmesg.txt (49.39 KB, text/plain)
2019-01-22 19:12 UTC, Francesco Turco
Details
config (88.08 KB, text/plain)
2019-01-22 19:13 UTC, Francesco Turco
Details

Description Francesco Turco 2019-01-22 19:11:08 UTC
I have a Gentoo Linux desktop system with an old dual-core Intel Core 2 Duo E8400 CPU and an Intel DQ35JO motherboard.

Since my CPU is dual-core, I built my kernel with the CONFIG_NR_CPUS=2 option. But after booting my system, I get a warning message in the logs:

Jan 22 19:57:33 desktop kernel: smpboot: 4 Processors exceeds NR_CPUS limit of 2

If I set CONFIG_NR_CPUS=4 instead the previous warning message disappears, but I don't understand where the number 4 comes from.

As far as I know, my CPU doesn't support hyper-threading. There is no reference to hyper-threading in the online specs and in the PDF manuals. There is also no reference to hyper-threading in the BIOS.

The BIOS is up to date (version 1143).

I don't update the Intel microcode at boot time.

Kernel version: 4.20.3

Links:
- https://ark.intel.com/products/33910/Intel-Core-2-Duo-Processor-E8400-6M-Cache-3-00-GHz-1333-MHz-FSB
- https://ark.intel.com/products/50382/Intel-Desktop-Board-DQ35JO
- https://forums.gentoo.org/viewtopic-p-8303270.html
Comment 1 Francesco Turco 2019-01-22 19:11:49 UTC
Created attachment 280673 [details]
lscpu.txt
Comment 2 Francesco Turco 2019-01-22 19:12:36 UTC
Created attachment 280675 [details]
dmesg.txt
Comment 3 Francesco Turco 2019-01-22 19:13:35 UTC
Created attachment 280677 [details]
config

/usr/src/linux/.config
Comment 4 Raffaello D. Di Napoli 2024-01-09 02:25:26 UTC
I noticed I’m getting a similar warning, on a Ryzen 5700G system (8C/16T), after I set CONFIG_NR_CPUS to 16, which should be the correct value for such system.

The warning comes from prefill_possible_map() in arch/x86/kernel/smpboot.c, and there are a few variables involved, so I added a log line to see what the function sees:

---
 _init void prefill_possible_map(void)
 {
          int i, possible;
 
+         pr_info("Raf DEBUG: setup_max_cpus=%d setup_possible_cpus=%d "
+                 "num_processors=%d disabled_cpus=%d\n", setup_max_cpus,
+                 setup_possible_cpus, num_processors, disabled_cpus);
          i = setup_max_cpus ?: 1;           
          if (setup_possible_cpus == -1) {
                  possible = num_processors;
---

This resulted in:

smpboot: Raf DEBUG: setup_max_cpus=16 setup_possible_cpus=-1 num_processors=16 disabled_cpus=16

So where are those 16 disabled CPUs coming from?! There are a few possible places, to I added more logging to all of them. This one turned out to be the right one, in acpi_register_lapic() in arch/x86/kernel/acpi/boot.c:

---
          }             
 
          if (!enabled) {
+                 pr_info("Raf DEBUG: ACPI registering disabled CPU %d %u\n",
+                         id, acpiid);
                  ++disabled_cpus;     
                  return -EINVAL;
          }
---

Which resulted in:

ACPI: Raf DEBUG: ACPI registering disabled CPU 0 16
ACPI: Raf DEBUG: ACPI registering disabled CPU 0 17
...
ACPI: Raf DEBUG: ACPI registering disabled CPU 0 30
ACPI: Raf DEBUG: ACPI registering disabled CPU 0 31

This means that my motherboard’s UEFI is providing, via ACPI, a list of 16 online + 16 ghost offline processors. In other words, a firmware bug; nothing wrong with Linux AFAICT.

As to why – I can only guess that whoever wrote that bit of UEFI code, got lazy and simply hard-coded the highest possible number of cores, rather than checking with the CPU to discover the actual number.

What to do about it? In my case at least, nothing. Looking at the code around the warning, the only thing happening is that it trims the count of CPUs (originally from ACPI) down to NR_CPUS, which means it drops all those ghost CPUs – which is perfectly fine. So it’s just dmesg noise.

Note You need to log in before you can comment on or make changes to this bug.