Created attachment 229601 [details] side by side dmesg comparision where biggest early discrepancies occur Arch Linux recently rolled out 4.7.0, and I noticed my boot time increase dramatically. The lag/pause occurs just after selecting the desired kernel from my syslinux menu, and before it changes the modeline resolution and prompts for my cryptsetup root password. I installed the linux-lts package for Arch (4.4.16) and compared the dmesg outputs, though the speed change was noticed from the last version of 4.6. It was more convenient to try 4.4 since there's a package for it (just want to rule out that it existed in 4.6). I've attached a file showing the three major discrepancies I see. Here's some quick comparisons for reference: ### kernel 4.4.16 $ systemd-analyze Startup finished in 4.683s (kernel) + 7.625s (initrd) + 1.084s (userspace) = 13.393s ### kernel 4.7.0 $ systemd-analyze Startup finished in 10.746s (kernel) + 10.518s (initrd) + 4.060s (userspace) = 25.325s Biggest differences happen during these chunks of the boot process, both involving ACPI : EC. The first accounts for a ~1sec difference: ### kernel 4.4.16 [ 0.279066] ACPI : EC: EC started [ 4.418531] ACPI: Interpreter enabled ### kernel 4.7.0 [ 0.279066] ACPI : EC: EC started [ 4.418531] ACPI: Interpreter enabled The second is a stop/start of EC which adds another 6sec to 4.7.0: ### kernel 4.4.16 [ 4.528033] ACPI: Enabled 7 GPEs in block 00 to 3F [ 4.528068] ACPI : EC: GPE = 0x16, I/O: command/status = 0 [ 4.528134] vgaarb: setting as boot device: PCI:0000:00:02 ### kernel 4.7.0 [ 5.711566] ACPI: Enabled 7 GPEs in block 00 to 3F [ 5.711639] ACPI : EC: EC stopped [ 5.711667] ACPI : EC: GPE = 0x16, I/O: command/status = 0 [ 5.711668] ACPI : EC: EC started [ 10.526295] vgaarb: setting as boot device: PCI:0000:00:02 I have a post on the Arch Linux forums about this as well (in case others comment there): - https://bbs.archlinux.org/viewtopic.php?id=216096 I also found another user on Reddit reporting what appears to be similar: - https://www.reddit.com/r/linuxquestions/comments/4yok1c/slow_kernel_boot_acpi_ec/ The laptop is an HP Zbook 15 running x86_64. I *believe* this is related to ACPI as that's where the first 1sec jump occurs, and the later big 6sec leap surrounds ACPI lines as well. That said, the first line *after* the time jump relates to vgaarb, so if you think the bug is really about that, please let me know and I can re-submit elsewhere.
can you attach the kernel config files for the two kernels?
Created attachment 231181 [details] arch linux-lts default kernel config (4.4.19)
Created attachment 231191 [details] arch standard kernel default config (currently 4.7.2)
@Zhang Rui: Done. The kernel has updated to 4.7.2 (from 4.7.0) since I first experienced these issues, but the boot lag remains. If you'd like me to dig up the config for 4.7.0 I can do that; this was just easier.
Created attachment 231201 [details] full dmesg from initial boot 4.4.16
Created attachment 231211 [details] full dmesg from initial boot 4.7.0
Need the comparable dmesgs with acpi.trace_state=enable boot parameter specified. Thanks
Will add later today. Thanks for the advice.
Created attachment 231721 [details] dmesg from LTS kernel (4.4.19) with acpi.trace_state=enable
Created attachment 231731 [details] dmesg from standard kernel (4.7.2) with acpi.trace_state=enable
"acpi.trace_state=enable" isn't working. Please confirm if CONFIG_ACPI_DEBUG is set. See Documentation/acpi/method-tracing.txt for details. Thanks Lv
Thanks for catching that. I haven't used the option and wasn't sure how to tell if it did anything new. I checked the attached configs and it's not set in the default arch package, so I'll have to compile both kernels after setting it.
Created attachment 231811 [details] dmesg from LTS kernel (4.4.19) with acpi.trace_state=enable
Created attachment 231821 [details] dmesg from standard kernel (4.7.2) with acpi.trace_state=enable
Updated previous dmesg files after re-compiling both current and LTS Arch kernels from ABS (https://wiki.archlinux.org/index.php/Arch_Build_System). All options left unchanged from bundled config other than CONFIG_ACPI_DEBUG=y. dmesg outputs definitely look different now!
I have a patchset covering this issue. This is in fact not an issue. Just because _REG (which consumes time on this platform) is executed more times than the previous version. However it really can be tuned not to execute so many times by checking if 2 EC devices are equivelent. I'll Cc you when I send them to the community. And ping you here after posting. Thanks
why not post here as early as possible? and maybe we can get one more tested-by when posting them to the list :p
Because the issue reported here is a known and fixed issue in another bug. For the fix of that bug, I've already prepared the patchset and is about to send them out.
Hi, Hendy (BTW, could you let me know your full name?) Please apply the following patches and try again: https://patchwork.kernel.org/patch/9310505/ https://patchwork.kernel.org/patch/9310503/ https://patchwork.kernel.org/patch/9310509/ https://patchwork.kernel.org/patch/9310513/ Thanks
Sure thing. It's John Henderson. I'm going to apply them now and re-compile. Will let you know shortly and thanks for suck quick results!
Hmmm. Arch has a patchset that's bundled with the package, and I'm wondering if this could be affecting something. I'm going to attach ec.c for you, and the output of each patch is below. Let me know if I'm doing something incorrectly in applying. I looked manually at the 1-4 patch and it appears acpi_ec_dsdt_probe(void) doesn't match what the patch is looking for. For reference, this is where the package pulls sources from: - https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.7.tar.xz - https://www.kernel.org/pub/linux/kernel/v4.x/patch-4.7.2.xz I'll upload the PKGBUILD for reference as well. ########## $ patch -p1 -i ../1-4-ACPI-EC-Cleanup-first_ec-boot_ec-code.patch patching file drivers/acpi/ec.c Hunk #1 FAILED at 179. Hunk #2 succeeded at 1222 (offset -6 lines). Hunk #3 succeeded at 1293 (offset -6 lines). Hunk #4 FAILED at 1379. Hunk #5 succeeded at 1395 with fuzz 1 (offset -14 lines). Hunk #6 succeeded at 1403 (offset -14 lines). Hunk #7 succeeded at 1410 (offset -14 lines). Hunk #8 succeeded at 1437 (offset -14 lines). Hunk #9 FAILED at 1503. Hunk #10 succeeded at 1593 with fuzz 2 (offset -14 lines). Hunk #11 FAILED at 1616. 4 out of 11 hunks FAILED -- saving rejects to file drivers/acpi/ec.c.rej $ patch -p1 -i ../2-4-ACPI-EC-Fix-a-memory-leakage-issue-in-acpi_ec_add.patch patching file drivers/acpi/ec.c Hunk #1 succeeded at 1395 with fuzz 1 (offset -43 lines). Hunk #2 succeeded at 1424 (offset -43 lines). $ patch -p1 -i ../3-4-ACPI-EC-Fix-a-gap-that-ECDT-EC-cannot-handle-EC-events.patch patching file drivers/acpi/ec.c Hunk #1 succeeded at 107 (offset -1 lines). Hunk #2 succeeded at 1299 (offset -7 lines). Hunk #3 succeeded at 1328 (offset -7 lines). Hunk #4 succeeded at 1348 (offset -7 lines). Hunk #5 FAILED at 1392. Hunk #6 FAILED at 1414. Hunk #7 FAILED at 1433. Hunk #8 succeeded at 1413 (offset -43 lines). Hunk #9 succeeded at 1428 (offset -43 lines). Hunk #10 succeeded at 1446 (offset -43 lines). Hunk #11 succeeded at 1489 with fuzz 2 (offset -45 lines). Hunk #12 FAILED at 1543. Hunk #13 FAILED at 1652. 5 out of 13 hunks FAILED -- saving rejects to file drivers/acpi/ec.c.rej patching file drivers/acpi/internal.h Hunk #1 succeeded at 182 (offset -3 lines). patching file drivers/acpi/scan.c Hunk #1 succeeded at 1959 with fuzz 2 (offset -85 lines). $ patch -p1 -i ../4-4-ACPI-EC-Fix-issues-related-to-boot_ec.patch patching file drivers/acpi/ec.c Hunk #1 succeeded at 1387 with fuzz 1 (offset -74 lines). Hunk #2 succeeded at 1444 (offset -65 lines). Hunk #3 succeeded at 1469 (offset -65 lines). Hunk #4 succeeded at 1486 (offset -65 lines). Hunk #5 FAILED at 1629. Hunk #6 FAILED at 1642. 2 out of 6 hunks FAILED -- saving rejects to file drivers/acpi/ec.c.rej
Created attachment 231961 [details] arch PKGBUILD for building 4.7.2 from source
Created attachment 231971 [details] the version of drivers/acpi/ec.c I'm working with
(In reply to jw.hendy from comment #21) > Hmmm. Arch has a patchset that's bundled with the package, and I'm wondering > if this could be affecting something. I'm going to attach ec.c for you, and > the output of each patch is below. Let me know if I'm doing something > incorrectly in applying. I looked manually at the 1-4 patch and it appears > acpi_ec_dsdt_probe(void) doesn't match what the patch is looking for. > > For reference, this is where the package pulls sources from: > - https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.7.tar.xz > - https://www.kernel.org/pub/linux/kernel/v4.x/patch-4.7.2.xz This "patchset" are just changes needed to update from 4.7.0 to 4.7.2 (stable). I was able to apply these four patches on top of v4.8-rc4, you might want to try that instead.
Not sure I follow. So if I'm on 4.7.2 these patches are already in place?
Oh. Sorry, I get it. The arch patches turn bring 4.7.0 up to 4.7.2. I guess I can try things on 4.8rc but I don't plan in running a release candidate kernel and the issue is with this kernel. Why not have them applied on 4.7.x?
Created attachment 232071 [details] Two patches from v4.8 + 4 proposed patches (backported for v4.7.2) The patch that is supposed to solve your issue is based of some ongoing development which is why it does not apply directly. I doubt that these will be backported, these will probably end up in v4.9 (I guess). For testing the patches you can try to apply it on any version after v4.8-rc1, but if you really do not want to do that, I have attached a (compile-tested) backport based on v4.7.2 + fd6231e785f1 + bc539567ab52 + four patches from Lv.
Thanks Peter! I was able to apply that to the Arch package for 4.7.2 from ABS. This appears to help significantly, and I'm back down to ~14sec. I attached an acpi traced dmesg for the patched version of 4.7.2 for inspection. $ systemd-analyze Startup finished in 5.994s (kernel) + 7.424s (initrd) + 808ms (userspace) = 14.228s $ systemd-analyze blame 5.891s systemd-cryptsetup@luks\x2d7101e83b\x2d31c0\x2d4cdf\x2dbc07\x2 202ms wicd.service 180ms systemd-journald.service 59ms systemd-journal-flush.service 45ms systemd-logind.service 42ms alsa-restore.service 40ms systemd-udev-trigger.service 39ms systemd-udevd.service 35ms systemd-vconsole-setup.service 35ms dev-hugepages.mount 32ms systemd-modules-load.service 31ms sys-kernel-debug.mount 28ms kmod-static-nodes.service 28ms polkit.service 25ms systemd-tmpfiles-setup-dev.service 24ms systemd-remount-fs.service 19ms systemd-fsck@dev-disk-by\x2duuid-37441f68\x2d9d76\x2d45bc\x2db 16ms systemd-fsck-root.service 16ms user@1000.service 15ms dev-mqueue.mount 14ms tmp.mount I'd say the changes here resolved nearly all or all of the issue, at least to the degree that if I'd updated to a version with this performance I wouldn't have suspected anything major going on like I did. Thanks!
Created attachment 232301 [details] dmesg with acpi.trace_state=enable for 4.7.2 compiled with Peter Wu's patches applied
(In reply to jw.hendy from comment #28) > Thanks Peter! I was able to apply that to the Arch package for 4.7.2 from > ABS. > > This appears to help significantly, and I'm back down to ~14sec. I attached > an acpi traced dmesg for the patched version of 4.7.2 for inspection. > > $ systemd-analyze > Startup finished in 5.994s (kernel) + 7.424s (initrd) + 808ms (userspace) = > 14.228s > > $ systemd-analyze blame > 5.891s > systemd-cryptsetup@luks\x2d7101e83b\x2d31c0\x2d4cdf\x2dbc07\x2 > 202ms wicd.service > 180ms systemd-journald.service > 59ms systemd-journal-flush.service > 45ms systemd-logind.service > 42ms alsa-restore.service > 40ms systemd-udev-trigger.service > 39ms systemd-udevd.service > 35ms systemd-vconsole-setup.service > 35ms dev-hugepages.mount > 32ms systemd-modules-load.service > 31ms sys-kernel-debug.mount > 28ms kmod-static-nodes.service > 28ms polkit.service > 25ms systemd-tmpfiles-setup-dev.service > 24ms systemd-remount-fs.service > 19ms > systemd-fsck@dev-disk-by\x2duuid-37441f68\x2d9d76\x2d45bc\x2db > 16ms systemd-fsck-root.service > 16ms user@1000.service > 15ms dev-mqueue.mount > 14ms tmp.mount > > I'd say the changes here resolved nearly all or all of the issue, at least > to the degree that if I'd updated to a version with this performance I > wouldn't have suspected anything major going on like I did. > > Thanks! So marking this as resolve. I'll rebase the patchset for the upstream. Thanks for the help, Peter and John!
Patch upstreamed: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2a5708409e4e05446eb1a89ecb48641d6fd5d5a9 Closing...
Thanks, Lv. I'm not familiar with how to know when those are brought into the kernel. Arch is on 4.8.7 now and I still have the issue (~23sec boot). What kernel is this targeted for? If you have a link of the process, feel free to save your words and just paste that! I read a bit about submitting a patch and the review process, but I'm not clear if things tend to wait for a particular type of update (x.0 vs. 4.x, vs. 4.x.y).
It's merged by upstream in v4.9-rc1. So it should appear in some 4.9.x kernels.
Perfect -- can't wait, and thanks for your resolution.
If you want to know the process... TBH, I don't know distribution vendors' processes. I just confirmed this via: # git tag --contains <commit ID> With the upstream linus tree. Commit ID is "2a5708409e4e" in this case.
*** Bug 191561 has been marked as a duplicate of this bug. ***
I still have this bug with kernel 4.15 on Ubuntu and also when I last used Arch on 4.14 I had it too.