Bug 218665

Summary: nohz_full=0 prevents kernel from booting
Product: Process Management Reporter: Friedrich Oslage (friedrich)
Component: SchedulerAssignee: Ingo Molnar (mingo)
Status: RESOLVED CODE_FIX    
Severity: normal CC: regressions
Priority: P3    
Hardware: AMD   
OS: Linux   
Kernel Version: Subsystem:
Regression: Yes Bisected commit-id: 5797b1c18919cd9c289ded7954383e499f729ce0
Attachments: Output when booting with nohz_full=0
Output when booting with nohz_full=1
Output when booting with isolcpus=nohz,0

Description Friedrich Oslage 2024-03-31 14:14:27 UTC
Hello,

booting the current kernel (6.9.0-rc1, master/712e1425) on x86_64 with nohz_full=0 cause a page fault and prevents the kernel from booting.

Steps to reproduce:
- make defconfig
- set CONFIG_NO_HZ_FULL=y
- set CONFIG_SUSPEND=n and CONFIG_HIBERNATION=n (to get CONFIG_PM_SLEEP_SMP=n)
- make
- qemu-system-x86_64 -nographic -cpu qemu64 -smp cores=2 -m 1024 -kernel arch/x86/boot/bzImage -append "earlyprintk=ttyS0 console=ttyS0 root=/dev/dummy rootwait nohz_full=0"

I have attached the output of a failed nohz_full=0 boot as nohz_full_0.txt and - for reference - the output of a nohz_full=1 boot as nohz_full_1.txt.

Interestingly enough, using the deprecated isolcpus parameter to enable NO_HZ for cpu0 works. I've attached the output as isolcpus_nohz_0.txt.

Bisecting showed 5797b1c18919cd9c289ded7954383e499f729ce0 as first bad commit.
Comment 1 Friedrich Oslage 2024-03-31 14:15:02 UTC
Created attachment 306068 [details]
Output when booting with nohz_full=0
Comment 2 Friedrich Oslage 2024-03-31 14:15:17 UTC
Created attachment 306069 [details]
Output when booting with nohz_full=1
Comment 3 Friedrich Oslage 2024-03-31 14:15:41 UTC
Created attachment 306070 [details]
Output when booting with isolcpus=nohz,0
Comment 4 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-04-02 08:29:37 UTC
Forwarded by mail:
https://lore.kernel.org/all/5be248c6-cdda-4d2e-8fae-30fc2cc124c0@leemhuis.info/
Comment 5 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-04-04 05:55:51 UTC
This looks like another report about the problem: https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/

#regzbot monitor https://lore.kernel.org/all/20240402105847.GA24832@redhat.com/
Comment 6 Friedrich Oslage 2024-05-06 19:24:52 UTC
Bug was fixed in commit 5097cbcb [1] which is included in kernel >= 6.9-rc6.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5097cbcb38e6e0d2627c9dde1985e91d2c9f880e