Bug 202475 - Boot crash in acpi
Summary: Boot crash in acpi
Status: RESOLVED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: ACPICA-Core (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Erik Schmauss
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-31 10:53 UTC by Michael J Gruber
Modified: 2019-04-12 19:05 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.20.5 (and 4.19 series)
Tree: Mainline
Regression: Yes


Attachments
lspci (2.22 KB, text/plain)
2019-02-05 14:25 UTC, Michael J Gruber
Details
lscpu (1008 bytes, text/plain)
2019-02-05 14:26 UTC, Michael J Gruber
Details
4.19.3 boot kernelmsgs without crash (94.32 KB, text/plain)
2019-02-05 14:26 UTC, Michael J Gruber
Details
4.19.7 boot kernelmsgs with crash (89.10 KB, text/plain)
2019-02-05 14:26 UTC, Michael J Gruber
Details
4.19.8 atomacpi (revert patch) boot kernelmsgs without crash (88.32 KB, text/plain)
2019-02-05 14:27 UTC, Michael J Gruber
Details
4.19.8 boot kernelmsgs with crash (85.06 KB, text/plain)
2019-02-05 14:27 UTC, Michael J Gruber
Details
4.19.2-300 boot kernelmsgs with crash (84.58 KB, text/plain)
2019-02-05 14:28 UTC, Michael J Gruber
Details
4.19.2-301 boot kernelmsgs with crash (84.10 KB, text/plain)
2019-02-05 14:29 UTC, Michael J Gruber
Details
4.19.4-300 boot kernelmsgs without crash (92.02 KB, text/plain)
2019-02-05 14:29 UTC, Michael J Gruber
Details
4.19.5 boot kernelmsgs without crash (89.56 KB, text/plain)
2019-02-05 14:30 UTC, Michael J Gruber
Details
4.19.6 boot kernelmsgs with crash (85.63 KB, text/plain)
2019-02-05 14:30 UTC, Michael J Gruber
Details
4.19.9 boot kernelmsgs with crash (85.89 KB, text/plain)
2019-02-05 14:30 UTC, Michael J Gruber
Details
4.19.9 boot kernelmsgs with crash (85.89 KB, text/plain)
2019-02-05 14:31 UTC, Michael J Gruber
Details
4.20.0 boot kernelmsgs with crash (86.33 KB, text/plain)
2019-02-05 14:31 UTC, Michael J Gruber
Details
4.20.5 no crash with revert patch (88.30 KB, text/plain)
2019-02-05 14:34 UTC, Michael J Gruber
Details
4.20.5 crash (88.55 KB, text/plain)
2019-02-05 14:34 UTC, Michael J Gruber
Details
4.16.16-300 (Fedora 28 kernel on fedora 29) no crash (88.44 KB, text/plain)
2019-02-05 14:35 UTC, Michael J Gruber
Details
4.20.6 with revert patch (no crash) (90.30 KB, text/plain)
2019-02-05 14:36 UTC, Michael J Gruber
Details
acpidump (335.79 KB, text/plain)
2019-02-05 14:36 UTC, Michael J Gruber
Details
4.16.16 acpi.debug (92.82 KB, text/plain)
2019-02-06 20:17 UTC, Michael J Gruber
Details
4.20.5 acpi.debug (90.83 KB, text/plain)
2019-02-06 20:18 UTC, Michael J Gruber
Details
4.16.16 debug msgs (719.12 KB, application/x-xz)
2019-02-07 20:17 UTC, Michael J Gruber
Details
4.20.5 debug msgs (604.15 KB, application/x-xz)
2019-02-07 20:18 UTC, Michael J Gruber
Details
remove op region address after control method execution (996 bytes, patch)
2019-02-21 20:39 UTC, Erik Schmauss
Details | Diff
4.20.11 stock Fedora debug kernel (16.64 KB, application/x-xz)
2019-02-23 11:54 UTC, Michael J Gruber
Details
4.20.11 debug kernel with suggested patch (16.57 KB, application/x-xz)
2019-02-23 11:55 UTC, Michael J Gruber
Details
4.20.11 debug kernel with the revert (16.55 KB, application/x-xz)
2019-02-23 11:55 UTC, Michael J Gruber
Details
4.20.5 debug msgs (new options) (16.92 KB, application/x-xz)
2019-02-26 20:22 UTC, Michael J Gruber
Details
4.20.5 patched debug msgs (new options) (16.61 KB, application/x-xz)
2019-02-26 20:23 UTC, Michael J Gruber
Details
4.20.5 patched non-debug msgs (14.32 KB, application/x-xz)
2019-02-26 20:23 UTC, Michael J Gruber
Details

Description Michael J Gruber 2019-01-31 10:53:54 UTC
Most kernels in the 4.19 and 4.20 series crash during boot in the acpi component on specific hardware (see below).

This is 100% reproducible.

By bisection, the culprit is commit 4abb951b73ff ("ACPICA: AML interpreter: add region addresses in global list during initialization"), which was first introduced in upstream v4.19.2 (commit 22083c028d0b), reverted in v4.19.3 (commit 8ef305fbc50d), and finally included again in v4.19.6 (commit 87403e35bc56).

Jeremy Cline sent an email in response to its inclusion in v4.19.6, but it must have slipped through the cracks for the ACPI maintainers.

Original Fedora bug: https://bugzilla.redhat.com/show_bug.cgi?id=1659225

Note that I tested both with vanilla and with Fedora kernels.

This is is an old HP/Compaq 6715b laptop. lscpu and lspci output is attached to the Fedora bug, as well as kernel msgs with and without crash for several kernel versions. Also, an atom user reported a similar problem that disappeared with the same revert.

Please let me know if I should repost the attachments from the Fedora bz over here.
Comment 1 Erik Schmauss 2019-01-31 21:23:29 UTC
(In reply to M.J.G. from comment #0)
> Most kernels in the 4.19 and 4.20 series crash during boot in the acpi
> component on specific hardware (see below).
> 
> This is 100% reproducible.
> 
> By bisection, the culprit is commit 4abb951b73ff ("ACPICA: AML interpreter:
> add region addresses in global list during initialization"), which was first
> introduced in upstream v4.19.2 (commit 22083c028d0b), reverted in v4.19.3
> (commit 8ef305fbc50d), and finally included again in v4.19.6 (commit
> 87403e35bc56).
> 
> Jeremy Cline sent an email in response to its inclusion in v4.19.6, but it
> must have slipped through the cracks for the ACPI maintainers.
> 
> Original Fedora bug: https://bugzilla.redhat.com/show_bug.cgi?id=1659225
> 
> Note that I tested both with vanilla and with Fedora kernels.
> 
> This is is an old HP/Compaq 6715b laptop. lscpu and lspci output is attached
> to the Fedora bug, as well as kernel msgs with and without crash for several
> kernel versions. Also, an atom user reported a similar problem that
> disappeared with the same revert.
> 
> Please let me know if I should repost the attachments from the Fedora bz
> over here.

Please do, that would be helpful. Also, could you try booting with an older kernel? I'm thinking 4.16 or earlier and post the dmesg here. Also, please post the ACPI dump.

https://01.org/linux-acpi/utilities
Comment 2 Michael J Gruber 2019-02-05 14:25:42 UTC
Created attachment 280967 [details]
lspci
Comment 3 Michael J Gruber 2019-02-05 14:26:11 UTC
Created attachment 280969 [details]
lscpu
Comment 4 Michael J Gruber 2019-02-05 14:26:37 UTC
Created attachment 280971 [details]
4.19.3 boot kernelmsgs without crash
Comment 5 Michael J Gruber 2019-02-05 14:26:55 UTC
Created attachment 280973 [details]
4.19.7 boot kernelmsgs with crash
Comment 6 Michael J Gruber 2019-02-05 14:27:24 UTC
Created attachment 280975 [details]
4.19.8 atomacpi (revert patch) boot kernelmsgs without crash
Comment 7 Michael J Gruber 2019-02-05 14:27:46 UTC
Created attachment 280977 [details]
4.19.8 boot kernelmsgs with crash
Comment 8 Michael J Gruber 2019-02-05 14:28:20 UTC
Created attachment 280979 [details]
4.19.2-300 boot kernelmsgs with crash
Comment 9 Michael J Gruber 2019-02-05 14:29:07 UTC
Created attachment 280981 [details]
4.19.2-301 boot kernelmsgs with crash
Comment 10 Michael J Gruber 2019-02-05 14:29:40 UTC
Created attachment 280983 [details]
4.19.4-300 boot kernelmsgs without crash
Comment 11 Michael J Gruber 2019-02-05 14:30:04 UTC
Created attachment 280985 [details]
4.19.5 boot kernelmsgs without crash
Comment 12 Michael J Gruber 2019-02-05 14:30:26 UTC
Created attachment 280987 [details]
4.19.6 boot kernelmsgs with crash
Comment 13 Michael J Gruber 2019-02-05 14:30:47 UTC
Created attachment 280989 [details]
4.19.9 boot kernelmsgs with crash
Comment 14 Michael J Gruber 2019-02-05 14:31:10 UTC
Created attachment 280991 [details]
4.19.9 boot kernelmsgs with crash
Comment 15 Michael J Gruber 2019-02-05 14:31:32 UTC
Created attachment 280993 [details]
4.20.0 boot kernelmsgs with crash
Comment 16 Michael J Gruber 2019-02-05 14:34:34 UTC
Created attachment 280995 [details]
4.20.5 no crash with revert patch
Comment 17 Michael J Gruber 2019-02-05 14:34:56 UTC
Created attachment 280997 [details]
4.20.5 crash
Comment 18 Michael J Gruber 2019-02-05 14:35:49 UTC
Created attachment 280999 [details]
4.16.16-300 (Fedora 28 kernel on fedora 29) no crash
Comment 19 Michael J Gruber 2019-02-05 14:36:29 UTC
Created attachment 281001 [details]
4.20.6 with revert patch (no crash)
Comment 20 Michael J Gruber 2019-02-05 14:36:49 UTC
Created attachment 281003 [details]
acpidump
Comment 21 Michael J Gruber 2019-02-05 14:38:47 UTC
You want kernel msgs? I'll give kernel msgs! ;)

I'll be happy to give you more; testing/upload cycle takes a bit, though.
Comment 22 Erik Schmauss 2019-02-05 23:40:40 UTC
(In reply to M.J.G. from comment #21)
> You want kernel msgs? I'll give kernel msgs! ;)
> 
> I'll be happy to give you more; testing/upload cycle takes a bit, though.

Thanks! This is helpful but I need a little more information. Please boot the 4.16.16 (working) and 4.20.5 (crash) with the following kernel parameters:

acpi.debug_layer=0xffffffff acpi.debug_level=0x600 log_buf_len=10M

These flags tend to spew a lot of data so that's why I included the log_buf_len parameter. Please feel free to increase or decrease the size of the buffer in order to capture all messages during boot.
Comment 23 Michael J Gruber 2019-02-06 20:17:43 UTC
Created attachment 281029 [details]
4.16.16 acpi.debug
Comment 24 Michael J Gruber 2019-02-06 20:18:23 UTC
Created attachment 281031 [details]
4.20.5 acpi.debug
Comment 25 Erik Schmauss 2019-02-06 23:46:47 UTC
Please post 4.20 adn 4.16 dmesg after re-building with CONFIG_ACPI_DEBUG=y

Sorry, I forgot to mention this :(
Comment 26 Michael J Gruber 2019-02-07 10:00:06 UTC
(In reply to Erik Schmauss from comment #25)
> Please post 4.20 adn 4.16 dmesg after re-building with CONFIG_ACPI_DEBUG=y
> 
> Sorry, I forgot to mention this :(

Looking at the meager diffs between the old and new dmesgs, I suspected something like that already...

On Fedora, this gets enabled for the debug kernel together with a lot of other debug options. Would this be OK for your purposes, or do you need the output of a non-debug kernel specifically rebuilt with CONFIG_ACPI_DEBUG=y only?
Comment 27 Erik Schmauss 2019-02-07 17:02:27 UTC
(In reply to M.J.G. from comment #26)
> (In reply to Erik Schmauss from comment #25)
> > Please post 4.20 adn 4.16 dmesg after re-building with CONFIG_ACPI_DEBUG=y
> > 
> > Sorry, I forgot to mention this :(
> 
> Looking at the meager diffs between the old and new dmesgs, I suspected
> something like that already...
> 
> On Fedora, this gets enabled for the debug kernel together with a lot of
> other debug options. Would this be OK for your purposes, or do you need the

So we might get more debugging info that we actually need. That's completely fine by me!

> output of a non-debug kernel specifically rebuilt with CONFIG_ACPI_DEBUG=y
> only?
Comment 28 Michael J Gruber 2019-02-07 20:17:34 UTC
Created attachment 281045 [details]
4.16.16 debug msgs
Comment 29 Michael J Gruber 2019-02-07 20:18:24 UTC
Created attachment 281047 [details]
4.20.5 debug msgs
Comment 30 Erik Schmauss 2019-02-21 00:10:39 UTC
Sorry about the delayed response. Could you boot with acpi.debug_layer= 0x00400000 acpi.debug_level=0x600 log_buf_len=10M?

Thanks for your patience!
Comment 31 Erik Schmauss 2019-02-21 01:04:00 UTC
Also, what happens when you try to revert this commit on a kernel that crashes?

commit 88fa2dfb075a20c3464e3d303c57dd8ced9e8309
Author: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Date:   Tue Oct 10 18:11:15 2017 +0200

    i2c: piix4: Disable completely the IMC during SMBUS_BLOCK_DATA
Comment 32 Michael J Gruber 2019-02-21 16:46:47 UTC
(In reply to Erik Schmauss from comment #31)
> Also, what happens when you try to revert this commit on a kernel that
> crashes?
> 
> commit 88fa2dfb075a20c3464e3d303c57dd8ced9e8309
> Author: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
> Date:   Tue Oct 10 18:11:15 2017 +0200
> 
>     i2c: piix4: Disable completely the IMC during SMBUS_BLOCK_DATA

What happens is that the revert does not apply cleanly to the 4.20.11 tree; if I tree to resolve the conflicts the kernel fails to build (see build.log for https://koji.fedoraproject.org/koji/taskinfo?taskID=32946407).

That commit is not exactly small, and a lot has changed in drivers/i2c/busses/i2c-piix4.c since its introduction (it's described as v4.14-rc4-5-g88fa2dfb075a) and before the appearance of the crashes (revert does not apply cleanly to 4.19.6 either).

As for the changed debug parameters - which kernel should I boot with them to be useful for you?
Comment 33 Erik Schmauss 2019-02-21 19:31:21 UTC
Please ignore my requests in comment 30 and comment 31 for now. I think I've figure out the issue..
Comment 34 Erik Schmauss 2019-02-21 20:39:48 UTC
Created attachment 281267 [details]
remove op region address after control method execution

Please give apply this on a kernel version that crashes
Comment 35 Michael J Gruber 2019-02-23 11:54:28 UTC
Created attachment 281299 [details]
4.20.11 stock Fedora debug kernel
Comment 36 Michael J Gruber 2019-02-23 11:55:13 UTC
Created attachment 281301 [details]
4.20.11 debug kernel with suggested patch
Comment 37 Michael J Gruber 2019-02-23 11:55:54 UTC
Created attachment 281303 [details]
4.20.11 debug kernel with the revert
Comment 38 Michael J Gruber 2019-02-23 11:57:57 UTC
I'm just dumping there for now - 4.20.11 seems to crash even with the revert patch.

I'm not sue what to make out of this. I guess I should have tested with an earlier kernel - but then, what changed in 4.20.11?
Comment 39 Erik Schmauss 2019-02-25 17:07:32 UTC
(In reply to M.J.G. from comment #38)
> I'm just dumping there for now - 4.20.11 seems to crash even with the revert
> patch.
> 
> I'm not sue what to make out of this. I guess I should have tested with an
> earlier kernel - but then, what changed in 4.20.11?

I don't really know what changed in 4.20.11 that could have caused this crash... Unfortunately, you'll have to bisect it and file another bug.

For now, let's boot on 4.20.5 and work on improving the global address list manipulations.

Thanks!
Comment 40 Michael J Gruber 2019-02-26 20:20:39 UTC
(In reply to Erik Schmauss from comment #39)
> (In reply to M.J.G. from comment #38)
> > I'm just dumping there for now - 4.20.11 seems to crash even with the
> revert
> > patch.
> > 
> > I'm not sue what to make out of this. I guess I should have tested with an
> > earlier kernel - but then, what changed in 4.20.11?
> 
> I don't really know what changed in 4.20.11 that could have caused this
> crash... Unfortunately, you'll have to bisect it and file another bug.
> 
> For now, let's boot on 4.20.5 and work on improving the global address list
> manipulations.
> 
> Thanks!

Sorry, forget what I wrote about 4.20.11, I got confused: The new debug options seem to hide the BUG/Oops lines and produce a call trace no matter what; also, the debug kernels eventually finish the boot even if the corresponding none debug kernel hangs after the crash (at least for a few minutes, after which I force halt).

I booted a 4.20.5 now with your patch (instead of the revert), it boots without crash (non-debug). 

Upcoming are the log for the patched 4.20.5 kernel and the debug logs with acpi.debug_layer=0x00400000 acpi.debug_level=0x600 log_buf_len=40M for the patched and unpatched 4.20.5.
Comment 41 Michael J Gruber 2019-02-26 20:22:06 UTC
Created attachment 281363 [details]
4.20.5 debug msgs (new options)
Comment 42 Michael J Gruber 2019-02-26 20:23:05 UTC
Created attachment 281365 [details]
4.20.5 patched debug msgs (new options)
Comment 43 Michael J Gruber 2019-02-26 20:23:48 UTC
Created attachment 281367 [details]
4.20.5 patched non-debug msgs
Comment 44 Erik Schmauss 2019-02-26 21:45:21 UTC
(In reply to M.J.G. from comment #40)
> (In reply to Erik Schmauss from comment #39)
> > (In reply to M.J.G. from comment #38)
> > > I'm just dumping there for now - 4.20.11 seems to crash even with the
> > revert
> > > patch.
> > > 
> > > I'm not sue what to make out of this. I guess I should have tested with
> an
> > > earlier kernel - but then, what changed in 4.20.11?
> > 
> > I don't really know what changed in 4.20.11 that could have caused this
> > crash... Unfortunately, you'll have to bisect it and file another bug.
> > 
> > For now, let's boot on 4.20.5 and work on improving the global address list
> > manipulations.
> > 
> > Thanks!
> 
> Sorry, forget what I wrote about 4.20.11, I got confused: The new debug
> options seem to hide the BUG/Oops lines and produce a call trace no matter
> what; also, the debug kernels eventually finish the boot even if the
> corresponding none debug kernel hangs after the crash (at least for a few
> minutes, after which I force halt).
> 

These debug messages can easily turn into a fire hose... These debug flags are typically used for interactive debugging purposes rather than debugging boot. The reason why I asked you to use them was to try to under stand what was going on but after thinking about this for a few days, I was finally able to understand what was going on..

> I booted a 4.20.5 now with your patch (instead of the revert), it boots
> without crash (non-debug). 

Great! Thanks for testing this! I'll look over this patch again before sending it out.

I think there may have been a few other people have had issues with it so it would be good for them to test it out as well. If you have seen other people complain about this, please feel free to send this patch to them. I haven't heard anything about this patch for a while but feel free to add me to any existing conversations.

Also, how would you like me to credit you? I was going to add your name to reported-by and tested-by tags to the patch.

> 
> Upcoming are the log for the patched 4.20.5 kernel and the debug logs with
> acpi.debug_layer=0x00400000 acpi.debug_level=0x600 log_buf_len=40M for the
> patched and unpatched 4.20.5.
Comment 45 Michael J Gruber 2019-02-27 09:51:07 UTC
(In reply to Erik Schmauss from comment #44)
> 
> Great! Thanks for testing this! I'll look over this patch again before
> sending it out.

Thanks for you patience and perseverance!

> 
> I think there may have been a few other people have had issues with it so it
> would be good for them to test it out as well. If you have seen other people
> complain about this, please feel free to send this patch to them. I haven't
> heard anything about this patch for a while but feel free to add me to any
> existing conversations.

We had an anonymous reporter (on an atom box) for a Fedora test build. I updated the info on the Fedora bug https://bugzilla.redhat.com/show_bug.cgi?id=1659225 with a link to your patch and my kernel build, maybe some of the cc's there can test.

> Also, how would you like me to credit you? I was going to add your name to
> reported-by and tested-by tags to the patch.

I changed my kernel bz display name to my full name now. In Fedora land, I am:

Michael J Gruber <mjg@fedoraproject.org>

In kernel land I'm unknown, unless from the git side of things ;)
Comment 46 Erik Schmauss 2019-02-28 17:46:02 UTC
I'll be away next week. So I'll take a look around the week of 3/11 and go from there. Let me know if you have any updates as far as testing goes
Comment 47 Erik Schmauss 2019-03-13 20:27:27 UTC
I'm a back now. @Michael: I'm going to submit this change after adding a few tests to our infrastructure. Let me know if you have any other feedback.
Comment 48 Michael J Gruber 2019-03-18 09:04:59 UTC
There has been no further feedback on the rh bz.

I've been applying your patch to 4.20-series Fedora kernels up to 4.20.16. Applies, builds and solves the ACPI issue on my (ancient) hp6715b laptops. Thanks for hanging in!
Comment 49 Erik Schmauss 2019-03-18 16:59:38 UTC
no problem! Thanks for the bug report!
Comment 50 Erik Schmauss 2019-03-22 21:20:58 UTC
This patch should be in rc-3 or rc-4. I'll submit this to the stable kernel mailing list when I do. For now, I'll leave this open and close when I submit the patch.
Comment 51 Erik Schmauss 2019-04-12 19:05:40 UTC
patch has been applied to Rafael's linux-next branch and will be a part of the next rc.

Closing...

Note You need to log in before you can comment on or make changes to this bug.