Bug 202475
Description
Michael J Gruber
2019-01-31 10:53:54 UTC
(In reply to M.J.G. from comment #0) > Most kernels in the 4.19 and 4.20 series crash during boot in the acpi > component on specific hardware (see below). > > This is 100% reproducible. > > By bisection, the culprit is commit 4abb951b73ff ("ACPICA: AML interpreter: > add region addresses in global list during initialization"), which was first > introduced in upstream v4.19.2 (commit 22083c028d0b), reverted in v4.19.3 > (commit 8ef305fbc50d), and finally included again in v4.19.6 (commit > 87403e35bc56). > > Jeremy Cline sent an email in response to its inclusion in v4.19.6, but it > must have slipped through the cracks for the ACPI maintainers. > > Original Fedora bug: https://bugzilla.redhat.com/show_bug.cgi?id=1659225 > > Note that I tested both with vanilla and with Fedora kernels. > > This is is an old HP/Compaq 6715b laptop. lscpu and lspci output is attached > to the Fedora bug, as well as kernel msgs with and without crash for several > kernel versions. Also, an atom user reported a similar problem that > disappeared with the same revert. > > Please let me know if I should repost the attachments from the Fedora bz > over here. Please do, that would be helpful. Also, could you try booting with an older kernel? I'm thinking 4.16 or earlier and post the dmesg here. Also, please post the ACPI dump. https://01.org/linux-acpi/utilities Created attachment 280967 [details]
lspci
Created attachment 280969 [details]
lscpu
Created attachment 280971 [details]
4.19.3 boot kernelmsgs without crash
Created attachment 280973 [details]
4.19.7 boot kernelmsgs with crash
Created attachment 280975 [details]
4.19.8 atomacpi (revert patch) boot kernelmsgs without crash
Created attachment 280977 [details]
4.19.8 boot kernelmsgs with crash
Created attachment 280979 [details]
4.19.2-300 boot kernelmsgs with crash
Created attachment 280981 [details]
4.19.2-301 boot kernelmsgs with crash
Created attachment 280983 [details]
4.19.4-300 boot kernelmsgs without crash
Created attachment 280985 [details]
4.19.5 boot kernelmsgs without crash
Created attachment 280987 [details]
4.19.6 boot kernelmsgs with crash
Created attachment 280989 [details]
4.19.9 boot kernelmsgs with crash
Created attachment 280991 [details]
4.19.9 boot kernelmsgs with crash
Created attachment 280993 [details]
4.20.0 boot kernelmsgs with crash
Created attachment 280995 [details]
4.20.5 no crash with revert patch
Created attachment 280997 [details]
4.20.5 crash
Created attachment 280999 [details]
4.16.16-300 (Fedora 28 kernel on fedora 29) no crash
Created attachment 281001 [details]
4.20.6 with revert patch (no crash)
Created attachment 281003 [details]
acpidump
You want kernel msgs? I'll give kernel msgs! ;) I'll be happy to give you more; testing/upload cycle takes a bit, though. (In reply to M.J.G. from comment #21) > You want kernel msgs? I'll give kernel msgs! ;) > > I'll be happy to give you more; testing/upload cycle takes a bit, though. Thanks! This is helpful but I need a little more information. Please boot the 4.16.16 (working) and 4.20.5 (crash) with the following kernel parameters: acpi.debug_layer=0xffffffff acpi.debug_level=0x600 log_buf_len=10M These flags tend to spew a lot of data so that's why I included the log_buf_len parameter. Please feel free to increase or decrease the size of the buffer in order to capture all messages during boot. Created attachment 281029 [details]
4.16.16 acpi.debug
Created attachment 281031 [details]
4.20.5 acpi.debug
Please post 4.20 adn 4.16 dmesg after re-building with CONFIG_ACPI_DEBUG=y Sorry, I forgot to mention this :( (In reply to Erik Schmauss from comment #25) > Please post 4.20 adn 4.16 dmesg after re-building with CONFIG_ACPI_DEBUG=y > > Sorry, I forgot to mention this :( Looking at the meager diffs between the old and new dmesgs, I suspected something like that already... On Fedora, this gets enabled for the debug kernel together with a lot of other debug options. Would this be OK for your purposes, or do you need the output of a non-debug kernel specifically rebuilt with CONFIG_ACPI_DEBUG=y only? (In reply to M.J.G. from comment #26) > (In reply to Erik Schmauss from comment #25) > > Please post 4.20 adn 4.16 dmesg after re-building with CONFIG_ACPI_DEBUG=y > > > > Sorry, I forgot to mention this :( > > Looking at the meager diffs between the old and new dmesgs, I suspected > something like that already... > > On Fedora, this gets enabled for the debug kernel together with a lot of > other debug options. Would this be OK for your purposes, or do you need the So we might get more debugging info that we actually need. That's completely fine by me! > output of a non-debug kernel specifically rebuilt with CONFIG_ACPI_DEBUG=y > only? Created attachment 281045 [details]
4.16.16 debug msgs
Created attachment 281047 [details]
4.20.5 debug msgs
Sorry about the delayed response. Could you boot with acpi.debug_layer= 0x00400000 acpi.debug_level=0x600 log_buf_len=10M? Thanks for your patience! Also, what happens when you try to revert this commit on a kernel that crashes? commit 88fa2dfb075a20c3464e3d303c57dd8ced9e8309 Author: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Date: Tue Oct 10 18:11:15 2017 +0200 i2c: piix4: Disable completely the IMC during SMBUS_BLOCK_DATA (In reply to Erik Schmauss from comment #31) > Also, what happens when you try to revert this commit on a kernel that > crashes? > > commit 88fa2dfb075a20c3464e3d303c57dd8ced9e8309 > Author: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> > Date: Tue Oct 10 18:11:15 2017 +0200 > > i2c: piix4: Disable completely the IMC during SMBUS_BLOCK_DATA What happens is that the revert does not apply cleanly to the 4.20.11 tree; if I tree to resolve the conflicts the kernel fails to build (see build.log for https://koji.fedoraproject.org/koji/taskinfo?taskID=32946407). That commit is not exactly small, and a lot has changed in drivers/i2c/busses/i2c-piix4.c since its introduction (it's described as v4.14-rc4-5-g88fa2dfb075a) and before the appearance of the crashes (revert does not apply cleanly to 4.19.6 either). As for the changed debug parameters - which kernel should I boot with them to be useful for you? Please ignore my requests in comment 30 and comment 31 for now. I think I've figure out the issue.. Created attachment 281267 [details]
remove op region address after control method execution
Please give apply this on a kernel version that crashes
Created attachment 281299 [details]
4.20.11 stock Fedora debug kernel
Created attachment 281301 [details]
4.20.11 debug kernel with suggested patch
Created attachment 281303 [details]
4.20.11 debug kernel with the revert
I'm just dumping there for now - 4.20.11 seems to crash even with the revert patch. I'm not sue what to make out of this. I guess I should have tested with an earlier kernel - but then, what changed in 4.20.11? (In reply to M.J.G. from comment #38) > I'm just dumping there for now - 4.20.11 seems to crash even with the revert > patch. > > I'm not sue what to make out of this. I guess I should have tested with an > earlier kernel - but then, what changed in 4.20.11? I don't really know what changed in 4.20.11 that could have caused this crash... Unfortunately, you'll have to bisect it and file another bug. For now, let's boot on 4.20.5 and work on improving the global address list manipulations. Thanks! (In reply to Erik Schmauss from comment #39) > (In reply to M.J.G. from comment #38) > > I'm just dumping there for now - 4.20.11 seems to crash even with the > revert > > patch. > > > > I'm not sue what to make out of this. I guess I should have tested with an > > earlier kernel - but then, what changed in 4.20.11? > > I don't really know what changed in 4.20.11 that could have caused this > crash... Unfortunately, you'll have to bisect it and file another bug. > > For now, let's boot on 4.20.5 and work on improving the global address list > manipulations. > > Thanks! Sorry, forget what I wrote about 4.20.11, I got confused: The new debug options seem to hide the BUG/Oops lines and produce a call trace no matter what; also, the debug kernels eventually finish the boot even if the corresponding none debug kernel hangs after the crash (at least for a few minutes, after which I force halt). I booted a 4.20.5 now with your patch (instead of the revert), it boots without crash (non-debug). Upcoming are the log for the patched 4.20.5 kernel and the debug logs with acpi.debug_layer=0x00400000 acpi.debug_level=0x600 log_buf_len=40M for the patched and unpatched 4.20.5. Created attachment 281363 [details]
4.20.5 debug msgs (new options)
Created attachment 281365 [details]
4.20.5 patched debug msgs (new options)
Created attachment 281367 [details]
4.20.5 patched non-debug msgs
(In reply to M.J.G. from comment #40) > (In reply to Erik Schmauss from comment #39) > > (In reply to M.J.G. from comment #38) > > > I'm just dumping there for now - 4.20.11 seems to crash even with the > > revert > > > patch. > > > > > > I'm not sue what to make out of this. I guess I should have tested with > an > > > earlier kernel - but then, what changed in 4.20.11? > > > > I don't really know what changed in 4.20.11 that could have caused this > > crash... Unfortunately, you'll have to bisect it and file another bug. > > > > For now, let's boot on 4.20.5 and work on improving the global address list > > manipulations. > > > > Thanks! > > Sorry, forget what I wrote about 4.20.11, I got confused: The new debug > options seem to hide the BUG/Oops lines and produce a call trace no matter > what; also, the debug kernels eventually finish the boot even if the > corresponding none debug kernel hangs after the crash (at least for a few > minutes, after which I force halt). > These debug messages can easily turn into a fire hose... These debug flags are typically used for interactive debugging purposes rather than debugging boot. The reason why I asked you to use them was to try to under stand what was going on but after thinking about this for a few days, I was finally able to understand what was going on.. > I booted a 4.20.5 now with your patch (instead of the revert), it boots > without crash (non-debug). Great! Thanks for testing this! I'll look over this patch again before sending it out. I think there may have been a few other people have had issues with it so it would be good for them to test it out as well. If you have seen other people complain about this, please feel free to send this patch to them. I haven't heard anything about this patch for a while but feel free to add me to any existing conversations. Also, how would you like me to credit you? I was going to add your name to reported-by and tested-by tags to the patch. > > Upcoming are the log for the patched 4.20.5 kernel and the debug logs with > acpi.debug_layer=0x00400000 acpi.debug_level=0x600 log_buf_len=40M for the > patched and unpatched 4.20.5. (In reply to Erik Schmauss from comment #44) > > Great! Thanks for testing this! I'll look over this patch again before > sending it out. Thanks for you patience and perseverance! > > I think there may have been a few other people have had issues with it so it > would be good for them to test it out as well. If you have seen other people > complain about this, please feel free to send this patch to them. I haven't > heard anything about this patch for a while but feel free to add me to any > existing conversations. We had an anonymous reporter (on an atom box) for a Fedora test build. I updated the info on the Fedora bug https://bugzilla.redhat.com/show_bug.cgi?id=1659225 with a link to your patch and my kernel build, maybe some of the cc's there can test. > Also, how would you like me to credit you? I was going to add your name to > reported-by and tested-by tags to the patch. I changed my kernel bz display name to my full name now. In Fedora land, I am: Michael J Gruber <mjg@fedoraproject.org> In kernel land I'm unknown, unless from the git side of things ;) I'll be away next week. So I'll take a look around the week of 3/11 and go from there. Let me know if you have any updates as far as testing goes I'm a back now. @Michael: I'm going to submit this change after adding a few tests to our infrastructure. Let me know if you have any other feedback. There has been no further feedback on the rh bz. I've been applying your patch to 4.20-series Fedora kernels up to 4.20.16. Applies, builds and solves the ACPI issue on my (ancient) hp6715b laptops. Thanks for hanging in! no problem! Thanks for the bug report! This patch should be in rc-3 or rc-4. I'll submit this to the stable kernel mailing list when I do. For now, I'll leave this open and close when I submit the patch. patch has been applied to Rafael's linux-next branch and will be a part of the next rc. Closing... |