Bug 200959 - Early boot CPU stalling and then frozen boot - related to kernel/time/clocksource.c
Summary: Early boot CPU stalling and then frozen boot - related to kernel/time/clockso...
Status: RESOLVED DUPLICATE of bug 200957
Alias: None
Product: Timers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: john stultz
URL: https://bbs.archlinux.org/viewtopic.p...
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-28 15:08 UTC by Siegfried Metz
Modified: 2018-08-29 18:44 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.18 up to 4.18.5
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Arch default kernel config (212.99 KB, text/plain)
2018-08-28 15:11 UTC, Siegfried Metz
Details
/proc/version (129 bytes, text/plain)
2018-08-28 15:13 UTC, Siegfried Metz
Details
ver_linux (1.64 KB, text/plain)
2018-08-28 15:14 UTC, Siegfried Metz
Details
/proc/cpuinfo (1.69 KB, text/plain)
2018-08-28 15:14 UTC, Siegfried Metz
Details
Arch default kernel config 4.18.5 - correct one (212.51 KB, text/plain)
2018-08-28 21:09 UTC, Siegfried Metz
Details

Description Siegfried Metz 2018-08-28 15:08:57 UTC
Hey,
I have an Intel Core 2 Duo E8500 processor and since mainline kernel 4.18, up to the latest 4.18.5, 
I am unable to boot and the kernel hangs in the early boot process. 

This affects Core 2 Duo processors, but also someone on the Archlinux forum with a more recent CPU reported that an AMD Ryzen 1700 might be affected as well.

I am using Archlinux as distribution with the testing repository enabled.

In the helpful Archlinux community a few users used "git bisect" to track down the issue to this bad commit:
[7197e77abcb65a71d0b21d67beb24f153a96055e] clocksource: Remove kthread 

Either reverting the bad commit or using an additional kernel boot parameter "clocksource=hpet" are the only viable options for using any of the 4.18.x kernels.

This is what I used to determine what the clocksource parameter should be:
"cat /sys/devices/system/clocksource/clocksource0/available_clocksource" (my case: hpet or acpi_pm) 
"cat /sys/devices/system/clocksource/clocksource0/available_clocksource" (my case: hpet)

I also made sure the latest (now EOL'd) 4.17 kernel is unaffected by this bug and self-built kernel 4.17.19 based on the latest Archlinux 4.17.14 kernel config. There is no issue whatsoever related to this bug in 4.17.19 and the kernel just shows a normal successful boot behavior.


MAIN-PART:

I used "earlyprintk=vga debug break=postmount" (without quiet) as kernel boot parameters with grub2 
and it got me the following output, but it stalls so early in the boot process there is no dmesg 
or any sort of log or /new_root mount or any mounts at all (netconsole etc I did not use).

It takes about 20 - 30 seconds of stalling with the above mentioned kernel boot parameters 
before more output happens:

---

INFO: rcu_preempt detected stalls on CPUs/tasks:
o0-...!: (0 ticks this GP) idle=608/0/0 softirq=21/21 fqs=0 last_accelerate: e800/e800, non-lazy_posted: 549, ..
o(detected by 1, t=18220 jiffies, g=- 284, c=- 285, q=236)
Sending NMI from CPU1 to CPUs0:
NMI backtrace for CPU0 skipped: idling at acpi_processor_ffh_cstate_enter+0x67/0xb0
rcu_preempt kthread starved for 18220 jiffies! g18446744073709551332 c18446744073709551332 f0x0 RCU_GP_WAIT_FQS(3) -> state=0x402 -> cpu=0
RCU grace-period kthread stack dump:
rcu_preempt      I    0    10      2 0x80000000

Call Trace:
? __schedule+0x29b/0x8b0
schedule+0x32/0x90
schedule_timeout+0x1d1/0x4a0
? collect_expired_timers+0xa0/0xa0
rcu_gp_kthread+0x43e/0x950
? synchronize_rcu_expedited+0x30/0x30
kthread+0x112/0x130
? kthread_flush_work_fn+0x10/0x10
ret_from_fork+0x35/0x40

---

Keep in mind that I have an Core 2 Duo and building the kernel manually takes at least about ~ 2 hours 
or more based on a default Archlinux kernel config. If you want me to test the latest mainline git 4.19-rc 
kernel or build with any patches applied or even enable CONFIG_DEBUG_INFO, 
it just takes seriously more time an my "old power horse". ;-)

I am happily providing you more information if there is such a need. Just let me know.

Thanks to the awesome collaboration of the Archlinux community we managed to track this down and I took 
the initiative to report my first kernel bug upstream. :)

I managed to thoroughly read through the kernel bug-reporting guides, 
but hope you tread lightly on any obvious mistakes.
Comment 1 Siegfried Metz 2018-08-28 15:11:18 UTC
Created attachment 278159 [details]
Arch default kernel config
Comment 2 Siegfried Metz 2018-08-28 15:13:02 UTC
Created attachment 278161 [details]
/proc/version
Comment 3 Siegfried Metz 2018-08-28 15:14:07 UTC
Created attachment 278163 [details]
ver_linux
Comment 4 Siegfried Metz 2018-08-28 15:14:59 UTC
Created attachment 278165 [details]
/proc/cpuinfo
Comment 5 Siegfried Metz 2018-08-28 15:27:21 UTC
Same as bug: https://bugzilla.kernel.org/show_bug.cgi?id=200957
Comment 6 Siegfried Metz 2018-08-28 21:09:49 UTC
Created attachment 278183 [details]
Arch default kernel config 4.18.5 - correct one

That's 100% the config of the currently running 4.18.5 kernel booted with clocksource=hpet.
Comment 7 john stultz 2018-08-29 18:44:42 UTC

*** This bug has been marked as a duplicate of bug 200957 ***

Note You need to log in before you can comment on or make changes to this bug.