Bug 192111 - Linux 4.9.x doesn't boot on some intel CPUs (T4200, i5-5200U, i7-4700 MQ, i7-4910 MQ) with ACPI activated
Summary: Linux 4.9.x doesn't boot on some intel CPUs (T4200, i5-5200U, i7-4700 MQ, i7-...
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-08 17:05 UTC by Frederic Bezies
Modified: 2017-02-09 10:03 UTC (History)
7 users (show)

See Also:
Kernel Version: 4.9
Subsystem:
Regression: No
Bisected commit-id:


Attachments
my lspci log (1.97 KB, text/x-log)
2017-01-08 17:05 UTC, Frederic Bezies
Details
archlinux kernel config (181.28 KB, application/octet-stream)
2017-01-09 17:50 UTC, Frederic Bezies
Details
First-cut patch to allow synchronous grace periods in mid-boot time (11.62 KB, patch)
2017-01-12 05:11 UTC, Paul E. McKenney
Details | Diff
First-cut patch against 4.10-rc3 to allow synchronous grace periods in mid-boot time (11.48 KB, patch)
2017-01-12 06:27 UTC, Paul E. McKenney
Details | Diff
Simplified patch against 4.10-rc3 to allow synchronous grace periods in mid-boot time (12.14 KB, patch)
2017-01-12 13:10 UTC, Paul E. McKenney
Details | Diff

Description Frederic Bezies 2017-01-08 17:05:09 UTC
Created attachment 250801 [details]
my lspci log

I faced a blocking bug. I cannot get it to boot on my Toshiba Laptop with and Intel T4200 CPU.

I'm using archlinux, and when I tried to boot it, it is frozen on initramfs loading.

I opened a bug on archlinux bugtracker here : https://bugs.archlinux.org/task/52246

I tried with Manjaro 17.0 alpha which is using linux 4.9 kernel, and I got a lot of acpi errors before kernel panic.

Adding my lpsci log if it helps.
Comment 1 Gene 2017-01-08 19:22:19 UTC
Dupicate of https://bugzilla.kernel.org/show_bug.cgi?id=191801
Comment 2 Frederic Bezies 2017-01-08 20:29:55 UTC
(In reply to Gene from comment #1)
> Dupicate of https://bugzilla.kernel.org/show_bug.cgi?id=191801

Not completely. My laptop is not efi based. It is an old one with a Bios.
Comment 3 Frederic Bezies 2017-01-09 17:48:39 UTC
If I insert acpi=off in grub line, it boots, but there is no working display.
Comment 4 Frederic Bezies 2017-01-09 17:50:16 UTC
Created attachment 250971 [details]
archlinux kernel config

archlinux linux 4.9.1 config file.
Comment 5 Frederic Bezies 2017-01-10 09:53:55 UTC
Kinda related or not to this closed bug ?

https://bugzilla.kernel.org/show_bug.cgi?id=188221
Comment 6 Frederic Bezies 2017-01-10 09:55:02 UTC
Changed product to ACPI, because I can boot without acpi even if I did not get any working display after that.
Comment 7 Borislav Petkov 2017-01-10 13:50:56 UTC
Any chance you could bisect it?

https://wiki.gentoo.org/wiki/Kernel_git-bisect
Comment 8 Frederic Bezies 2017-01-10 14:03:48 UTC
I don't have such free time in front of me to do a bisect. By the way, I tried without luck to apply patch from bug 188221.

My laptop is running 4.4.41 LTS linux kernel. And I think it will stay with it until 4.9 is in a better shape. I cannot imagine 4.9 to be chosen as a LTS version with such a bad start.
Comment 9 Ivan 2017-01-11 02:58:09 UTC
I'm also unable to boot any kernel after 4.9-rc1.
I get the same result with acpi=off, it boots but no display.
I bisected and ended in:

rcu: Drive expedited grace periods from workqueue
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=8b355e3bc1408be238ae4695fb6318ae502cae8e

After reverting that commit my computer booted the kernel.
I'm using a Toshiba C855D laptop with an a10-4600M APU in EFI mode.
Comment 10 Lee, Chun-Yi 2017-01-11 09:16:52 UTC
(In reply to Ivan from comment #9)
> I'm also unable to boot any kernel after 4.9-rc1.
> I get the same result with acpi=off, it boots but no display.
> I bisected and ended in:
> 
> rcu: Drive expedited grace periods from workqueue
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/
> ?id=8b355e3bc1408be238ae4695fb6318ae502cae8e
> 
> After reverting that commit my computer booted the kernel.
> I'm using a Toshiba C855D laptop with an a10-4600M APU in EFI mode.

It is possible because acpi/osl.c uses synchronize_rcu_expedited to speed up grace period in acpi_os_map_cleanup() since v3.19.

The detail of 8b355e3bc needs RCU expert's comment.
Comment 11 Borislav Petkov 2017-01-11 09:19:25 UTC
Interesting!

Thanks for bisecting. This looks similar to another issue we're
debugging which points at the same place: acpi calling into RCU too
early. I've added Paul to CC - I hope that is his bugzilla email but
I'll ping him otherwise too.

Lemme get confirmation for this from other bug reporters of probably the
same issue.

Thanks!
Comment 12 Frederic Bezies 2017-01-11 09:25:22 UTC
(In reply to Ivan from comment #9)
> I'm also unable to boot any kernel after 4.9-rc1.
> I get the same result with acpi=off, it boots but no display.
> I bisected and ended in:
> 
> rcu: Drive expedited grace periods from workqueue
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/
> ?id=8b355e3bc1408be238ae4695fb6318ae502cae8e
> 
> After reverting that commit my computer booted the kernel.
> I'm using a Toshiba C855D laptop with an a10-4600M APU in EFI mode.

Thanks for the bisecting effort. Will try to see if my old laptop using a T4200 CPU and a bios boots too. Keeping fingers crossed.
Comment 13 Paul E. McKenney 2017-01-11 10:20:46 UTC
I am working on a fix, and hope to post it by this time tomorrow.  It is a bit more complex than I would like for -rc4.  I think I can simplify it, but plan to send out the more complex one as a proof of concept.
Comment 14 Frederic Bezies 2017-01-11 12:40:32 UTC
(In reply to Paul E. McKenney from comment #13)
> I am working on a fix, and hope to post it by this time tomorrow.  It is a
> bit more complex than I would like for -rc4.  I think I can simplify it, but
> plan to send out the more complex one as a proof of concept.

Removing commit 8b355e3bc in linux 4.9.2 kernel works. My intel based laptop (T4200 CPU) at least boot with linux 4.9.x.

Any hope to get a backport fix to Linux 4.9.x ?
Comment 15 Paul E. McKenney 2017-01-12 01:53:29 UTC
(In reply to Frederic Bezies from comment #14)
> (In reply to Paul E. McKenney from comment #13)
> > I am working on a fix, and hope to post it by this time tomorrow.  It is a
> > bit more complex than I would like for -rc4.  I think I can simplify it,
> but
> > plan to send out the more complex one as a proof of concept.
> 
> Removing commit 8b355e3bc in linux 4.9.2 kernel works. My intel based laptop
> (T4200 CPU) at least boot with linux 4.9.x.
> 
> Any hope to get a backport fix to Linux 4.9.x ?

Maybe.  The initial implementation is a bit hairy, but things are looking good for a simplified one.  I will be sending out the hairy initial implementation in a few hours, if all goes well.  And if things go well after that, the simplified one.  ;-)
Comment 16 Paul E. McKenney 2017-01-12 05:11:54 UTC
Created attachment 251291 [details]
First-cut patch to allow synchronous grace periods in mid-boot time

This patch is a first cut at permitting synchronous RCU grace periods (synchronize_rcu() and friends) during the time starting with spawning of the first task and ending after the early_init() functions have all been invoked.

Please test!

There is likely to be a simpler follow-on patch, but there are enough similarities that testing of this patch produces useful data.
Comment 17 Paul E. McKenney 2017-01-12 05:21:16 UTC
And that patch doesn't apply anywhere useful at the moment.  Will fix, but in the meantime, another place to find the bits is my -rcu tree at git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git:
40833bca3e ("rcu: Narrow early boot window of illegal synchronous grace periods").
Comment 18 Paul E. McKenney 2017-01-12 06:27:13 UTC
Created attachment 251301 [details]
First-cut patch against 4.10-rc3 to allow synchronous grace periods in mid-boot time

This patch (against v4.10-rc3) is a first cut at permitting synchronous RCU grace periods (synchronize_rcu() and friends) during the time starting with spawning of the first task and ending after the early_init() functions have all been invoked.

Please test!

There is likely to be a simpler follow-on patch, but there are enough similarities that testing of this patch produces useful data.
Comment 19 Borislav Petkov 2017-01-12 08:52:12 UTC
(In reply to Paul E. McKenney from comment #18)
> Created attachment 251301 [details]
> First-cut patch against 4.10-rc3 to allow synchronous grace periods in
> mid-boot time
> 
> This patch (against v4.10-rc3) is a first cut at permitting synchronous RCU
> grace periods (synchronize_rcu() and friends) during the time starting with
> spawning of the first task and ending after the early_init() functions have
> all been invoked.
> 
> Please test!

Looks good.

However, this is on the laptop we were debugging on the ML with the
iommu intremap early init so it might or might not work on the toshibas.

Also, the patch applies to 4.9. Should I build test kernels for people
to verify or do you need other patches for 4.9 for this patch to work?

Thanks Paul.
Comment 20 Paul E. McKenney 2017-01-12 09:32:06 UTC
(In reply to Borislav Petkov from comment #19)
> (In reply to Paul E. McKenney from comment #18)
> > Created attachment 251301 [details]
> > First-cut patch against 4.10-rc3 to allow synchronous grace periods in
> > mid-boot time
> > 
> > This patch (against v4.10-rc3) is a first cut at permitting synchronous RCU
> > grace periods (synchronize_rcu() and friends) during the time starting with
> > spawning of the first task and ending after the early_init() functions have
> > all been invoked.
> > 
> > Please test!
> 
> Looks good.
> 
> However, this is on the laptop we were debugging on the ML with the
> iommu intremap early init so it might or might not work on the toshibas.
> 
> Also, the patch applies to 4.9. Should I build test kernels for people
> to verify or do you need other patches for 4.9 for this patch to work?
> 
> Thanks Paul.

Hard to believe, but I see no reason the same patch cannot work for 4.9.  Some times you get lucky, I guess.

Testing for the simpler patch is in flight, keeping fingers firmly crossed.  :-)
Comment 21 Borislav Petkov 2017-01-12 09:37:33 UTC
(In reply to Paul E. McKenney from comment #20)
> Hard to believe, but I see no reason the same patch cannot work for 4.9.

Hard to believe that it applies? Try it :-)

Or that it works? Now that I do believe :-)))

> Some times you get lucky, I guess.
> 
> Testing for the simpler patch is in flight, keeping fingers firmly crossed. 
> :-)

Ok, ping me when it is ready and I'll run it and prep a test kernel with
it.

Thanks.
Comment 22 Paul E. McKenney 2017-01-12 09:51:16 UTC
(In reply to Borislav Petkov from comment #21)
> (In reply to Paul E. McKenney from comment #20)
> > Hard to believe, but I see no reason the same patch cannot work for 4.9.
> 
> Hard to believe that it applies? Try it :-)
> 
> Or that it works? Now that I do believe :-)))

At some point, we must face the fact that it is what the computers believe that is really important.  ;-)

> > Some times you get lucky, I guess.
> > 
> > Testing for the simpler patch is in flight, keeping fingers firmly crossed. 
> > :-)
> 
> Ok, ping me when it is ready and I'll run it and prep a test kernel with
> it.

Will do!  Here is hoping!  ;-)
Comment 23 Paul E. McKenney 2017-01-12 13:10:31 UTC
Created attachment 251331 [details]
Simplified patch against 4.10-rc3 to allow synchronous grace periods in mid-boot time

This patch (against v4.10-rc3) is a simplified way of permitting synchronous RCU grace periods (synchronize_rcu() and friends) during the time starting with spawning of the first task and ending after the early_init() functions have all been invoked.

Please test!
Comment 24 Borislav Petkov 2017-01-12 13:43:12 UTC
(In reply to Paul E. McKenney from comment #23)
> Please test!

Works too. Lemme build a test kernel for folks to verify.

Thanks Paul.
Comment 25 Ivan 2017-01-12 21:40:43 UTC
(In reply to Paul E. McKenney from comment #23)
> Created attachment 251331 [details]
> Simplified patch against 4.10-rc3 to allow synchronous grace periods in
> mid-boot time
> 
> This patch (against v4.10-rc3) is a simplified way of permitting synchronous
> RCU grace periods (synchronize_rcu() and friends) during the time starting
> with spawning of the first task and ending after the early_init() functions
> have all been invoked.
> 
> Please test!

Compiled 4.9.3 with your patch and it booted. Thanks!!
Comment 26 Paul E. McKenney 2017-01-13 05:58:46 UTC
(In reply to Ivan from comment #25)
> (In reply to Paul E. McKenney from comment #23)
> > Created attachment 251331 [details]
> > Simplified patch against 4.10-rc3 to allow synchronous grace periods in
> > mid-boot time
> > 
> > This patch (against v4.10-rc3) is a simplified way of permitting
> synchronous
> > RCU grace periods (synchronize_rcu() and friends) during the time starting
> > with spawning of the first task and ending after the early_init() functions
> > have all been invoked.
> > 
> > Please test!
> 
> Compiled 4.9.3 with your patch and it booted. Thanks!!

Thank you, Ivan!  I have added your Tested-by.
Comment 27 Frederic Bezies 2017-01-13 13:09:49 UTC
(In reply to Paul E. McKenney from comment #26)
> (In reply to Ivan from comment #25)
> > (In reply to Paul E. McKenney from comment #23)
> > > Created attachment 251331 [details]
> > > Simplified patch against 4.10-rc3 to allow synchronous grace periods in
> > > mid-boot time
> > > 
> > > This patch (against v4.10-rc3) is a simplified way of permitting
> synchronous
> > > RCU grace periods (synchronize_rcu() and friends) during the time
> starting
> > > with spawning of the first task and ending after the early_init()
> functions
> > > have all been invoked.
> > > 
> > > Please test!
> > 
> > Compiled 4.9.3 with your patch and it booted. Thanks!!
> 
> Thank you, Ivan!  I have added your Tested-by.

And another satisfied tester. Thanks a lot for the patch. Until it is added to 4.9.4/4.9.5, I think I'll see my main computer doing some kernel compiling.
Comment 28 Len Brown 2017-01-23 23:44:05 UTC
commit is in Linux v4.10-rc5

52d7e48b86fc108e45a656d8e53e4237993c481d
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Tue, 10 Jan 2017 02:28:26 -0800
Subject: rcu: Narrow early boot window of illegal synchronous grace periods

closed.
Comment 29 Len Brown 2017-01-23 23:48:39 UTC
note that this patch is not present in v4.9.5
Comment 30 Artem 2017-02-09 09:28:15 UTC
(In reply to Len Brown from comment #29)
> note that this patch is not present in v4.9.5

It is not present in v4.9.6 and v4.9.7 either. So i am not able to boot on fedora 24/25. Will the patch be applied for 4.9 branch? Upcoming Debian release uses v4.9.1 which is frustrating.
Comment 31 Borislav Petkov 2017-02-09 10:03:24 UTC
It is stable commit:

  90687fc3c8c3 ("rcu: Narrow early boot window of illegal synchronous grace periods")

and it is in v4.9.6 and newer.

Note You need to log in before you can comment on or make changes to this bug.