Bug 202385
Summary: | Inexplicable warning message about NR_CPUS limit exceeded | ||
---|---|---|---|
Product: | Other | Reporter: | Francesco Turco (fturco) |
Component: | Other | Assignee: | other_other |
Status: | NEW --- | ||
Severity: | normal | CC: | rafdev |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.20.3 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
lscpu.txt
dmesg.txt config |
Description
Francesco Turco
2019-01-22 19:11:08 UTC
Created attachment 280673 [details]
lscpu.txt
Created attachment 280675 [details]
dmesg.txt
Created attachment 280677 [details]
config
/usr/src/linux/.config
I noticed I’m getting a similar warning, on a Ryzen 5700G system (8C/16T), after I set CONFIG_NR_CPUS to 16, which should be the correct value for such system. The warning comes from prefill_possible_map() in arch/x86/kernel/smpboot.c, and there are a few variables involved, so I added a log line to see what the function sees: --- _init void prefill_possible_map(void) { int i, possible; + pr_info("Raf DEBUG: setup_max_cpus=%d setup_possible_cpus=%d " + "num_processors=%d disabled_cpus=%d\n", setup_max_cpus, + setup_possible_cpus, num_processors, disabled_cpus); i = setup_max_cpus ?: 1; if (setup_possible_cpus == -1) { possible = num_processors; --- This resulted in: smpboot: Raf DEBUG: setup_max_cpus=16 setup_possible_cpus=-1 num_processors=16 disabled_cpus=16 So where are those 16 disabled CPUs coming from?! There are a few possible places, to I added more logging to all of them. This one turned out to be the right one, in acpi_register_lapic() in arch/x86/kernel/acpi/boot.c: --- } if (!enabled) { + pr_info("Raf DEBUG: ACPI registering disabled CPU %d %u\n", + id, acpiid); ++disabled_cpus; return -EINVAL; } --- Which resulted in: ACPI: Raf DEBUG: ACPI registering disabled CPU 0 16 ACPI: Raf DEBUG: ACPI registering disabled CPU 0 17 ... ACPI: Raf DEBUG: ACPI registering disabled CPU 0 30 ACPI: Raf DEBUG: ACPI registering disabled CPU 0 31 This means that my motherboard’s UEFI is providing, via ACPI, a list of 16 online + 16 ghost offline processors. In other words, a firmware bug; nothing wrong with Linux AFAICT. As to why – I can only guess that whoever wrote that bit of UEFI code, got lazy and simply hard-coded the highest possible number of cores, rather than checking with the CPU to discover the actual number. What to do about it? In my case at least, nothing. Looking at the code around the warning, the only thing happening is that it trims the count of CPUs (originally from ACPI) down to NR_CPUS, which means it drops all those ghost CPUs – which is perfectly fine. So it’s just dmesg noise. |